Implementation:Ucbepic Docetl Pipeline Optimize
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Optimization, API_Design |
| Last Updated | 2026-02-08 01:40 GMT |
Overview
Concrete Python API method for optimizing DocETL pipelines programmatically.
Description
Pipeline.optimize() converts the Pipeline to a dict, creates a DSLRunner, invokes the V1 Optimizer, and returns a new Pipeline instance with optimized operation configurations. It supports resuming from previous optimization state and saving optimized YAML to disk.
Usage
Call optimize() on a Pipeline that has operations marked with optimize=True. The returned Pipeline can be run directly or exported to YAML.
Code Reference
Source Location
- Repository: docetl
- File: docetl/api.py
- Lines: L191-233
Signature
class Pipeline:
def optimize(
self,
max_threads: int | None = None,
resume: bool = False,
save_path: str | None = None,
) -> "Pipeline":
"""
Optimize the pipeline. Returns a new Pipeline with optimized operations.
Args:
max_threads: Maximum threads for optimization.
resume: Resume from previous optimization state.
save_path: Path to save optimized YAML.
"""
Import
from docetl.api import Pipeline
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| max_threads | int or None | No | Parallel thread limit |
| resume | bool | No | Resume from checkpoint (default False) |
| save_path | str or None | No | Path to save optimized YAML |
Outputs
| Name | Type | Description |
|---|---|---|
| returns | Pipeline | New Pipeline with optimized operation configs |
Usage Examples
from docetl.api import Pipeline
from docetl.schemas import MapOp, Dataset
from docetl.base_schemas import PipelineStep, PipelineOutput
pipeline = Pipeline(
name="my_pipeline",
datasets={"input": Dataset(type="file", path="data.json")},
operations=[
MapOp(name="extract", type="map",
prompt="Extract: {{ input.text }}",
output={"schema": {"result": "string"}},
optimize=True),
],
steps=[PipelineStep(name="step1", input="input", operations=["extract"])],
output=PipelineOutput(type="file", path="output.json"),
default_model="gpt-4o-mini",
)
optimized = pipeline.optimize(save_path="optimized_pipeline.yaml")
cost = optimized.run()
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment