Implementation:Ucbepic Docetl DSLRunner Load Run Save
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Pipeline_Orchestration |
| Last Updated | 2026-02-08 01:40 GMT |
Overview
Concrete tool for executing complete DocETL pipelines provided by the DSLRunner class.
Description
The DSLRunner class is the central pipeline execution engine in DocETL. Its load_run_save() method orchestrates the full pipeline lifecycle: printing the query plan, loading datasets, executing the pull-based operation DAG via the last OpContainer's next() method, saving results, and returning the total LLM API cost.
Usage
Use DSLRunner when you need to run a DocETL pipeline from a parsed YAML configuration dict. In CLI mode, docetl run creates a DSLRunner and calls load_run_save(). In the Python API, Pipeline.run() delegates to DSLRunner internally.
Code Reference
Source Location
- Repository: docetl
- File: docetl/runner.py
- Lines: L106-494
Signature
class DSLRunner:
def __init__(self, config: dict, max_threads: int | None = None, **kwargs):
"""
Args:
config: Parsed YAML pipeline configuration dict.
max_threads: Maximum parallel execution threads.
"""
def load_run_save(self) -> float:
"""Execute the entire pipeline. Returns total LLM API cost."""
def syntax_check(self):
"""Validate all operations before execution."""
def load(self) -> None:
"""Load all datasets defined in configuration."""
def save(self, data: list[dict]) -> None:
"""Save final pipeline output to configured path."""
Import
from docetl.runner import DSLRunner
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | dict | Yes | Parsed YAML pipeline configuration |
| max_threads | int or None | No | Maximum parallel threads |
Outputs
| Name | Type | Description |
|---|---|---|
| load_run_save() returns | float | Total LLM API cost in dollars |
| output file | JSON or CSV | Pipeline results written to configured output path |
| intermediate files | JSON | Checkpoint files in intermediate_dir (if configured) |
Usage Examples
CLI Execution
# Run a YAML pipeline
docetl run pipeline.yaml
Python API Execution
import yaml
from docetl.runner import DSLRunner
with open("pipeline.yaml", "r") as f:
config = yaml.safe_load(f)
runner = DSLRunner(config, max_threads=4)
total_cost = runner.load_run_save()
print(f"Pipeline cost: ${total_cost:.2f}")