Implementation:Ucbepic Docetl Python API Schema Objects
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, API_Design |
| Last Updated | 2026-02-08 01:40 GMT |
Overview
Concrete Pydantic schema classes for defining DocETL operations and datasets in the Python API.
Description
DocETL exports operation schema classes (MapOp, ReduceOp, ResolveOp, FilterOp, UnnestOp, SplitOp, GatherOp, CodeMapOp, CodeReduceOp, CodeFilterOp, ExtractOp) and a Dataset schema class via docetl/schemas.py. These are the .schema class attributes of each operation class, providing typed Pydantic models for programmatic pipeline construction.
Usage
Import schema objects from docetl.schemas to define operations programmatically. Pass them to the Pipeline constructor's operations parameter.
Code Reference
Source Location
- Repository: docetl
- File: docetl/schemas.py (L1-53), docetl/api.py (L84-140)
Signature
# Schema type aliases (from docetl/schemas.py)
MapOp = map.MapOperation.schema
ReduceOp = reduce.ReduceOperation.schema
ResolveOp = resolve.ResolveOperation.schema
FilterOp = filter.FilterOperation.schema
UnnestOp = unnest.UnnestOperation.schema
SplitOp = split.SplitOperation.schema
GatherOp = gather.GatherOperation.schema
CodeMapOp = code_operations.CodeMapOperation.schema
CodeReduceOp = code_operations.CodeReduceOperation.schema
CodeFilterOp = code_operations.CodeFilterOperation.schema
OpType = MapOp | ReduceOp | ... | CodeFilterOp | ExtractOp
Import
from docetl.schemas import MapOp, ReduceOp, ResolveOp, FilterOp, UnnestOp, Dataset
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| name | str | Yes | Operation name identifier |
| type | str | Yes | Operation type (map, reduce, resolve, filter, etc.) |
| prompt | str | Conditional | Jinja2 template (required for LLM operations) |
| output.schema | dict | Conditional | Output field definitions |
| reduce_key | str or list | Conditional | Group-by key(s) for reduce operations |
Outputs
| Name | Type | Description |
|---|---|---|
| schema object | BaseModel | Validated Pydantic operation schema |
Usage Examples
from docetl.schemas import MapOp, ReduceOp, Dataset
dataset = Dataset(type="file", path="data/input.json")
map_op = MapOp(
name="extract",
type="map",
prompt="Extract entities from: {{ input.text }}",
output={"schema": {"entities": "list[str]"}},
model="gpt-4o-mini",
)
reduce_op = ReduceOp(
name="summarize",
type="reduce",
reduce_key="category",
prompt="Summarize: {% for item in inputs %}{{ item.text }}{% endfor %}",
output={"schema": {"summary": "string"}},
)