Workflow:Kubeflow Pipelines Pipeline Authoring and Compilation
| Knowledge Sources | |
|---|---|
| Domains | ML_Ops, Pipeline_Development, SDK |
| Last Updated | 2026-02-13 14:00 GMT |
Overview
End-to-end process for authoring KFP pipeline components in Python, connecting them via data passing, compiling to IR YAML, and submitting for execution.
Description
This workflow covers the fundamental "author-compile-run" loop that every Kubeflow Pipelines user follows. It demonstrates how to define pipeline components using Python decorators, pass data between components using typed inputs and outputs (including artifacts like Datasets and Models), compile the pipeline graph into a portable IR YAML specification, and submit it for execution on a KFP deployment. This is the foundational workflow for all KFP usage.
Key characteristics:
- Components are defined as decorated Python functions using @dsl.component
- Data passing uses typed artifacts (Dataset, Model) and parameters (str, int, bool, dict, list)
- Pipelines are compiled to IR YAML for portable, reproducible execution
- Submission uses the KFP Python client to connect to a deployed KFP instance
Usage
Execute this workflow when you are building a new ML pipeline from scratch using the KFP Python SDK. This is the starting point for any KFP pipeline development: define components, wire them together, compile, and run. It applies whenever you need to create reproducible, containerized ML workflows that run on Kubernetes.
Execution Steps
Step 1: Define Pipeline Components
Create Python functions decorated with @dsl.component that encapsulate individual pipeline steps. Each component declares typed inputs and outputs. There are two main component types: lightweight Python components (decorated functions) and container components (specifying a container image with commands and arguments).
Key considerations:
- Lightweight components use @dsl.component and run in auto-generated container images
- Container components use @dsl.container_component and specify an explicit container image
- All imports used inside a component must be declared within the function body
- Use Output[T] and Input[T] for artifact-typed parameters (Dataset, Model, etc.)
- Use OutputPath and InputPath for file-path-based data exchange
Step 2: Compose the Pipeline Graph
Define a pipeline function decorated with @dsl.pipeline that instantiates component tasks and connects them. Data dependencies between tasks are established by passing one task's output as another task's input. The KFP compiler automatically infers execution order from these data dependencies.
Key considerations:
- Connect tasks by passing task.output or task.outputs["key"] as arguments
- Execution order is inferred from data dependencies; no explicit ordering needed for data-connected tasks
- Use task.after(other_task) for explicit ordering when no data dependency exists
- Pipeline parameters allow external configuration at runtime
Step 3: Configure Task Properties
Set resource requirements, caching options, and retry policies on individual tasks. These configurations control how each step executes on the Kubernetes cluster.
Key considerations:
- Use set_memory_limit() and set_cpu_limit() for resource allocation
- Use set_caching_options(enable_caching=True/False) to control execution caching
- Use set_retry() for automatic retry on transient failures
- Resource settings map directly to Kubernetes resource requests and limits
Step 4: Compile Pipeline to IR YAML
Use the KFP compiler to transform the Python pipeline definition into a portable Intermediate Representation (IR) YAML file. This YAML file is the deployable artifact that the KFP backend consumes.
Pseudocode:
- Import compiler
- Call Compiler().compile(pipeline_func, output_path)
- The output YAML contains the full pipeline specification
Key considerations:
- The compiled YAML is self-contained and can be versioned, shared, and uploaded
- Compilation validates the pipeline graph for type consistency and connectivity
- The output format can be YAML directly or compressed as .tar.gz or .zip
Step 5: Submit Pipeline for Execution
Use the KFP Python client to connect to a deployed KFP instance and submit the pipeline for execution. The client supports creating runs from compiled YAML files or directly from pipeline functions.
Pseudocode:
- Create KFP client with endpoint
- Submit pipeline run with arguments
- Optionally wait for completion and retrieve results
Key considerations:
- The client connects to the KFP API server (typically port-forwarded or via ingress)
- Pipeline arguments are passed as a dictionary at submission time
- Runs can be created under experiments for organization
- The client provides methods to monitor run status and retrieve outputs