Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Ucbepic Docetl Operation Definition

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, API_Design
Last Updated 2026-02-08 01:40 GMT

Overview

A schema-driven operation definition principle that uses typed Pydantic models to specify LLM operations and datasets programmatically.

Description

Operation Definition is the process of creating typed operation and dataset objects using DocETL's Python API. Rather than writing YAML, users construct Pydantic model instances (MapOp, ReduceOp, ResolveOp, FilterOp, UnnestOp, CodeMapOp, etc.) and Dataset objects in Python code. This enables programmatic pipeline construction, dynamic operation generation, and IDE-assisted development with type checking.

Usage

Use this principle when building DocETL pipelines programmatically via the Python API. Each operation type has a corresponding schema class with validated parameters.

Theoretical Basis

Schema-driven API design:

  1. Type Definition: Define operation schemas as Pydantic BaseModel subclasses
  2. Validation: Automatically validate parameters at construction time
  3. Composition: Compose operations into pipeline steps
  4. Union Types: OpType union allows mixing different operation types in a single pipeline

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment