Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Kubeflow Kubeflow Post Deployment Verification

From Leeroopedia
Knowledge Sources
Domains Kubeflow, Platform Deployment, Verification, Operations
Last Updated 2026-02-13 00:00 GMT

Overview

Post-deployment verification is the systematic process of confirming that all Kubeflow components, infrastructure services, and user-facing endpoints are healthy and functioning correctly after installation.

Description

After completing all deployment steps (prerequisites validation, installation method selection, core infrastructure deployment, component deployment, and multi-user configuration), operators must verify the entire platform is working end-to-end. This is not merely checking pod status; it involves validating that services are routable, authentication flows work, and ML workloads can be submitted.

Post-deployment verification catches issues that may not surface during individual component deployment, such as:

  • Misconfigured Istio VirtualService routing preventing Dashboard access
  • Dex OIDC misconfiguration causing authentication failures
  • Missing RBAC permissions in Profile namespaces blocking workload submission
  • cert-manager certificate issuance failures affecting TLS termination
  • Resource exhaustion on cluster nodes preventing all pods from scheduling

This principle applies to both initial deployments and upgrades. After any change to the Kubeflow platform, a full verification pass should be performed to confirm nothing has regressed.

Usage

Perform post-deployment verification in the following scenarios:

  • After completing a fresh Kubeflow installation
  • After upgrading any Kubeflow component or infrastructure service
  • After modifying cluster-level configuration (network policies, resource quotas, node pools)
  • As part of a regular operational health check cadence
  • When users report issues accessing the platform or submitting workloads

Theoretical Basis

The verification process follows a layered approach, checking from infrastructure up to user experience:

Layer 1: Pod Health Check

  • Query all pods across all Kubeflow-related namespaces (kubeflow, istio-system, cert-manager, auth, knative-serving, user profile namespaces)
  • Verify every pod is in Running or Completed state
  • Investigate any pod in CrashLoopBackOff, Error, Pending, or ImagePullBackOff state
  • Check pod restart counts; high restart counts indicate instability even if the pod is currently Running

Layer 2: Service Endpoint Verification

  • Verify the Istio ingress gateway service has an external IP or is accessible via NodePort
  • Verify the Central Dashboard is routable through the ingress gateway
  • Verify Dex is responding to OIDC discovery requests
  • Verify Kubeflow Pipelines API is accessible

Layer 3: Authentication Flow Verification

  • Access the Central Dashboard URL
  • Confirm redirect to Dex login page
  • Authenticate with test credentials
  • Confirm redirect back to Dashboard with a valid session

Layer 4: Functional Smoke Test

  • Select a user namespace from the Dashboard
  • Verify Notebooks, Pipelines, and other component UIs are accessible
  • Optionally submit a simple test pipeline or create a test notebook server
  • Verify the workload completes successfully

Verification Outcome:

  • If all layers pass, the deployment is confirmed healthy
  • If any layer fails, investigate the specific failure before declaring the deployment complete

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment