Workflow:Kubeflow Kubeflow Platform Deployment

Knowledge Sources	Kubeflow Kubeflow Manifests Kubeflow Docs Installing Kubeflow
Domains	MLOps, Kubernetes, Platform_Engineering
Last Updated	2026-02-13 14:00 GMT

Overview

End-to-end process for deploying the Kubeflow AI reference platform on a Kubernetes cluster using either Kubeflow Manifests or Packaged Distributions.

Description

This workflow covers the complete deployment of Kubeflow onto a Kubernetes cluster. Kubeflow can be installed in two ways: via Kubeflow Manifests (the community-maintained kustomize overlays in the kubeflow/manifests repository) or via Packaged Distributions (vendor-specific installers from providers like AWS, Google Cloud, or other certified distributors). The workflow addresses prerequisites, cluster preparation, installation method selection, component deployment, multi-user configuration, and post-deployment verification.

Usage

Execute this workflow when you need to provision a Kubeflow AI platform on Kubernetes for your organization. This applies when you have an existing Kubernetes cluster (v1.29+) and want to deploy the full Kubeflow suite (Pipelines, Notebooks, Training, Serving, Katib, Model Registry) or a subset of components for your AI/ML teams.

Execution Steps

Step 1: Prerequisites Validation

Verify that all infrastructure prerequisites are met before beginning installation. This includes confirming the Kubernetes cluster version compatibility, ensuring sufficient cluster resources (CPU, memory, storage), validating network policies, and checking that required tools are installed (kubectl, kustomize or the distribution-specific CLI).

Key considerations:

Kubernetes version must be 1.29 or later for Kubeflow 1.10+
A default StorageClass must be configured for persistent volumes
ISTIO service mesh is required for the full platform
Sufficient node capacity for all Kubeflow control plane components

Step 2: Installation Method Selection

Choose between Kubeflow Manifests (community upstream) and a Packaged Distribution (vendor-managed). The manifests approach gives full control and customization but requires more operational expertise. Packaged Distributions offer simplified installation, vendor support, and cloud-native integrations at the cost of some flexibility.

Key considerations:

Manifests are maintained by the community in the kubeflow/manifests repository
Packaged Distributions are available from AWS, Google Cloud, and other vendors
For production environments, a Packaged Distribution may reduce operational burden
For development or customization, Manifests provide the most flexibility

Step 3: Core Infrastructure Deployment

Deploy the foundational infrastructure components that Kubeflow depends on. This includes the Istio service mesh, cert-manager for TLS certificates, Dex for authentication (or the cloud provider's identity service), and the central dashboard with profile controller.

Key considerations:

Istio must be deployed before Kubeflow components
cert-manager handles automatic TLS certificate provisioning
Dex provides a reference OpenID Connect implementation for AuthN
The profile controller manages per-user namespace isolation

Step 4: Component Deployment

Install the desired Kubeflow sub-projects. Each component can be deployed independently or as part of the full platform. The core components include Kubeflow Pipelines, Notebooks (Workbenches), Trainer, KServe, Katib, Model Registry, and Spark Operator.

What is deployed:

Kubeflow Pipelines for workflow orchestration
Notebooks for interactive development environments
Trainer for distributed model training jobs
KServe for model serving and inference
Katib for hyperparameter tuning and neural architecture search
Model Registry for model versioning and tracking
Spark Operator for large-scale data processing

Step 5: Multi_user Configuration

Configure multi-user isolation and access controls. This involves setting up user profiles (namespaces), configuring RBAC policies, integrating with the organization's identity provider, and establishing resource quotas per team or user.

Key considerations:

Each user or team gets an isolated Kubernetes namespace
PodSecurityStandards are enforced (restricted for system, baseline for user namespaces)
RBAC policies control access to shared resources like pipelines and models
Integration with external identity providers via OIDC

Step 6: Post_deployment Verification

Validate that all deployed components are healthy and accessible. This includes checking pod status across all Kubeflow namespaces, verifying the Central Dashboard is reachable, testing authentication flows, confirming pipeline execution works, and running a sample notebook and training job.

Key considerations:

All pods in kubeflow namespace should be Running or Completed
Central Dashboard should be accessible via the configured ingress
Authentication and authorization should work for test users
Run a sample pipeline to validate end-to-end functionality

Execution Diagram

GitHub URL

Workflow Repository