Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Kubeflow Kubeflow Kubernetes Cluster Environment

From Leeroopedia
Knowledge Sources
Domains Infrastructure, Kubernetes, Platform_Deployment
Last Updated 2026-02-13 00:00 GMT

Overview

Kubernetes cluster environment with version 1.25+ required for deploying the Kubeflow AI Reference Platform.

Description

This environment defines the Kubernetes cluster requirements for running Kubeflow. The cluster must be running Kubernetes version 1.25 or later to support the CRDs, RBAC policies, and PodSecurityStandards used by Kubeflow components. The cluster operator must have cluster-admin permissions to create namespaces, CRDs, and cluster-scoped resources. The cluster should support LoadBalancer or NodePort services for external access via Istio ingress gateway.

Kubeflow versions track Kubernetes version support: v1.7 supported Kubernetes 1.25, v1.8 supported 1.25-1.26, v1.9 supported 1.29, and v1.10+ targets the latest stable Kubernetes releases.

Usage

Use this environment for any Platform Deployment workflow. It is the mandatory prerequisite for deploying Istio, cert-manager, Dex, and all Kubeflow components. Both the Kubeflow Manifests path and Packaged Distributions require a functioning Kubernetes cluster meeting these specifications.

System Requirements

Category Requirement Notes
OS Linux (cluster nodes) Ubuntu 20.04+ or similar Linux distribution on cluster nodes
Kubernetes Version >= 1.25 v1.9 requires Kubernetes 1.29; check release notes for specific version compatibility
Hardware Multi-node cluster recommended Minimum 3 nodes with 4 CPU / 16GB RAM each for production
Network LoadBalancer or NodePort Required for Istio ingress gateway external access
Storage Default StorageClass provisioned PersistentVolume support required for pipelines, notebooks, and model artifacts
RBAC cluster-admin Deployer must have cluster-admin ClusterRoleBinding
Container Runtime containerd or CRI-O Docker runtime deprecated in Kubernetes 1.24+; Kubeflow 1.5+ switched to Emissary executor for containerd compatibility

Dependencies

System Packages

  • Kubernetes >= 1.25 (control plane and kubelets)
  • containerd or CRI-O (container runtime)
  • CoreDNS (cluster DNS, typically bundled)
  • etcd (cluster state store, typically bundled)

Cluster Add-ons

  • Default StorageClass with dynamic PV provisioning
  • LoadBalancer controller (e.g., MetalLB for bare metal, or cloud provider integration)
  • PodSecurityStandards support (enforced in Kubeflow 1.10+)

Credentials

The following credentials must be available to the cluster operator:

  • KUBECONFIG: Path to kubeconfig file with cluster-admin permissions (defaults to ~/.kube/config)

Quick Install

# Verify cluster access and version
kubectl version
kubectl cluster-info

# Verify cluster-admin permissions
kubectl auth can-i create namespaces --all-namespaces
kubectl auth can-i create customresourcedefinitions --all-namespaces

# Verify default StorageClass
kubectl get storageclass

Code Evidence

Version requirements from README.md (referencing prerequisites):

The Kubeflow AI reference platform can be installed via Packaged Distributions
or Kubeflow Manifests.

Kubernetes version compatibility from ROADMAP.md:L49:

* Kubernetes 1.29 support

Kubernetes version compatibility from ROADMAP.md:L69:

* Kubernetes 1.25 and 1.26 support

PodSecurityStandards enforcement from ROADMAP.md:L12:

* PodSecurityStandards restricted is enforced for all system namespaces.
  PodSecurityStandards baseline is enforced for user namespaces

Common Errors

Error Message Cause Solution
Unable to connect to the server kubeconfig not set or cluster unreachable Set KUBECONFIG or verify cluster endpoint with kubectl cluster-info
error: You must be logged in to the server (Unauthorized) Expired or invalid credentials in kubeconfig Refresh cluster credentials (e.g., aws eks update-kubeconfig or gcloud container clusters get-credentials)
forbidden: User cannot create resource Insufficient RBAC permissions Obtain cluster-admin ClusterRoleBinding from cluster owner
no matches for kind "PodSecurityPolicy" Kubernetes 1.25+ removed PodSecurityPolicy Upgrade to Kubeflow 1.10+ which uses PodSecurityStandards instead

Compatibility Notes

  • GKE (Google): Use gcloud container clusters get-credentials to obtain kubeconfig. Autopilot clusters have limitations with Istio sidecar injection.
  • EKS (AWS): Use aws eks update-kubeconfig. Ensure the EBS CSI driver is installed for PersistentVolume support.
  • AKS (Azure): Use az aks get-credentials. Enable the Azure Disk CSI driver for storage.
  • On-premise: Ensure MetalLB or equivalent LoadBalancer controller is deployed. PV provisioner must be configured for the default StorageClass.
  • Kind/Minikube: Suitable for development only. Minikube requires --memory=16384 --cpus=4 minimum.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment