Implementation:Ucbepic Docetl Build CLI Command
| Knowledge Sources | |
|---|---|
| Domains | Optimization, CLI |
| Last Updated | 2026-02-08 01:40 GMT |
Overview
Concrete CLI command and helper for configuring and launching DocETL pipeline optimization.
Description
The docetl build CLI command parses the optimizer_config section from a YAML pipeline file, validates required fields (evaluation_file, metric_key, available_models, max_iterations, save_dir), infers dataset information, and dispatches to either the V1 or MOAR optimizer. The run_moar_optimization() helper function handles MOAR-specific setup.
Usage
Use docetl build pipeline.yaml --optimizer moar to launch MOAR optimization. The pipeline YAML must include an optimizer_config section with all required fields.
Code Reference
Source Location
- Repository: docetl
- File: docetl/cli.py (L19-198), docetl/moar/cli_helpers.py (L92-305)
Signature
# CLI command
def build(
yaml_file: Path,
optimizer: str = "moar",
max_threads: int | None = None,
resume: bool = False,
save_path: Path = None,
) -> None:
"""Build/optimize a DocETL pipeline."""
# MOAR helper
def run_moar_optimization(
yaml_path: str,
optimizer_config: dict,
) -> Dict[str, Any]:
"""Run MOAR optimization from CLI. Returns experiment summary."""
Import
# CLI usage
docetl build pipeline.yaml --optimizer moar
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| yaml_file | Path | Yes | Path to YAML pipeline with optimizer_config |
| optimizer | str | No | "moar" (default) or "v1" |
| optimizer_config.evaluation_file | str | Yes | Path to @register_eval Python file |
| optimizer_config.metric_key | str | Yes | Key in evaluation results dict to optimize |
| optimizer_config.available_models | list[str] | Yes | LLM models to search over |
| optimizer_config.max_iterations | int | Yes | MCTS iteration budget |
| optimizer_config.save_dir | str | Yes | Output directory for optimized pipelines |
Outputs
| Name | Type | Description |
|---|---|---|
| results | Dict[str, Any] | Experiment summary with paths to optimized pipelines |
| optimized YAMLs | files | Written to save_dir |
Usage Examples
# Run MOAR optimization
docetl build pipeline.yaml --optimizer moar
# Resume interrupted optimization
docetl build pipeline.yaml --optimizer moar --resume
# optimizer_config section in pipeline.yaml
optimizer_config:
type: moar
save_dir: ./moar_results
available_models:
- gpt-4o
- gpt-4o-mini
evaluation_file: evaluate.py
metric_key: accuracy
max_iterations: 40
rewrite_agent_model: gpt-4o