Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Kubeflow Pipelines Pipeline Authoring and Compilation

From Leeroopedia
Knowledge Sources
Domains ML_Ops, Pipeline_Development, SDK
Last Updated 2026-02-13 14:00 GMT

Overview

End-to-end process for authoring KFP pipeline components in Python, connecting them via data passing, compiling to IR YAML, and submitting for execution.

Description

This workflow covers the fundamental "author-compile-run" loop that every Kubeflow Pipelines user follows. It demonstrates how to define pipeline components using Python decorators, pass data between components using typed inputs and outputs (including artifacts like Datasets and Models), compile the pipeline graph into a portable IR YAML specification, and submit it for execution on a KFP deployment. This is the foundational workflow for all KFP usage.

Key characteristics:

  • Components are defined as decorated Python functions using @dsl.component
  • Data passing uses typed artifacts (Dataset, Model) and parameters (str, int, bool, dict, list)
  • Pipelines are compiled to IR YAML for portable, reproducible execution
  • Submission uses the KFP Python client to connect to a deployed KFP instance

Usage

Execute this workflow when you are building a new ML pipeline from scratch using the KFP Python SDK. This is the starting point for any KFP pipeline development: define components, wire them together, compile, and run. It applies whenever you need to create reproducible, containerized ML workflows that run on Kubernetes.

Execution Steps

Step 1: Define Pipeline Components

Create Python functions decorated with @dsl.component that encapsulate individual pipeline steps. Each component declares typed inputs and outputs. There are two main component types: lightweight Python components (decorated functions) and container components (specifying a container image with commands and arguments).

Key considerations:

  • Lightweight components use @dsl.component and run in auto-generated container images
  • Container components use @dsl.container_component and specify an explicit container image
  • All imports used inside a component must be declared within the function body
  • Use Output[T] and Input[T] for artifact-typed parameters (Dataset, Model, etc.)
  • Use OutputPath and InputPath for file-path-based data exchange

Step 2: Compose the Pipeline Graph

Define a pipeline function decorated with @dsl.pipeline that instantiates component tasks and connects them. Data dependencies between tasks are established by passing one task's output as another task's input. The KFP compiler automatically infers execution order from these data dependencies.

Key considerations:

  • Connect tasks by passing task.output or task.outputs["key"] as arguments
  • Execution order is inferred from data dependencies; no explicit ordering needed for data-connected tasks
  • Use task.after(other_task) for explicit ordering when no data dependency exists
  • Pipeline parameters allow external configuration at runtime

Step 3: Configure Task Properties

Set resource requirements, caching options, and retry policies on individual tasks. These configurations control how each step executes on the Kubernetes cluster.

Key considerations:

  • Use set_memory_limit() and set_cpu_limit() for resource allocation
  • Use set_caching_options(enable_caching=True/False) to control execution caching
  • Use set_retry() for automatic retry on transient failures
  • Resource settings map directly to Kubernetes resource requests and limits

Step 4: Compile Pipeline to IR YAML

Use the KFP compiler to transform the Python pipeline definition into a portable Intermediate Representation (IR) YAML file. This YAML file is the deployable artifact that the KFP backend consumes.

Pseudocode:

  1. Import compiler
  2. Call Compiler().compile(pipeline_func, output_path)
  3. The output YAML contains the full pipeline specification

Key considerations:

  • The compiled YAML is self-contained and can be versioned, shared, and uploaded
  • Compilation validates the pipeline graph for type consistency and connectivity
  • The output format can be YAML directly or compressed as .tar.gz or .zip

Step 5: Submit Pipeline for Execution

Use the KFP Python client to connect to a deployed KFP instance and submit the pipeline for execution. The client supports creating runs from compiled YAML files or directly from pipeline functions.

Pseudocode:

  1. Create KFP client with endpoint
  2. Submit pipeline run with arguments
  3. Optionally wait for completion and retrieve results

Key considerations:

  • The client connects to the KFP API server (typically port-forwarded or via ingress)
  • Pipeline arguments are passed as a dictionary at submission time
  • Runs can be created under experiments for organization
  • The client provides methods to monitor run status and retrieve outputs

Execution Diagram

GitHub URL

Workflow Repository