Principle:Ucbepic Docetl Pipeline Assembly And Execution
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Pipeline_Orchestration |
| Last Updated | 2026-02-08 01:40 GMT |
Overview
A programmatic pipeline orchestration principle that assembles datasets, operations, and steps into a runnable Pipeline object via the Python API.
Description
Pipeline Assembly and Execution combines dataset objects, operation schemas, pipeline steps, and output configuration into a single Pipeline object that can be run, optimized, or exported to YAML. The Pipeline class internally converts to a YAML-equivalent dict and delegates execution to DSLRunner.
Usage
Use this principle for programmatic pipeline construction when YAML configuration is insufficient (e.g., dynamic operation generation, conditional pipelines, notebook-based workflows).
Theoretical Basis
Programmatic pipeline composition:
- Object Construction: Create typed Dataset, Operation, Step, and Output objects
- Assembly: Compose into a Pipeline object with named datasets
- Conversion: Pipeline internally converts to dict format matching YAML schema
- Delegation: Execution delegated to DSLRunner.load_run_save()