Principle:Ucbepic Docetl Operation Definition
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, API_Design |
| Last Updated | 2026-02-08 01:40 GMT |
Overview
A schema-driven operation definition principle that uses typed Pydantic models to specify LLM operations and datasets programmatically.
Description
Operation Definition is the process of creating typed operation and dataset objects using DocETL's Python API. Rather than writing YAML, users construct Pydantic model instances (MapOp, ReduceOp, ResolveOp, FilterOp, UnnestOp, CodeMapOp, etc.) and Dataset objects in Python code. This enables programmatic pipeline construction, dynamic operation generation, and IDE-assisted development with type checking.
Usage
Use this principle when building DocETL pipelines programmatically via the Python API. Each operation type has a corresponding schema class with validated parameters.
Theoretical Basis
Schema-driven API design:
- Type Definition: Define operation schemas as Pydantic BaseModel subclasses
- Validation: Automatically validate parameters at construction time
- Composition: Compose operations into pipeline steps
- Union Types: OpType union allows mixing different operation types in a single pipeline