Implementation:Ucbepic Docetl DSLRunner Load Run Save

Knowledge Sources	DocETL DocETL Docs
Domains	Data_Engineering, Pipeline_Orchestration
Last Updated	2026-02-08 01:40 GMT

Overview

Concrete tool for executing complete DocETL pipelines provided by the DSLRunner class.

Description

The DSLRunner class is the central pipeline execution engine in DocETL. Its load_run_save() method orchestrates the full pipeline lifecycle: printing the query plan, loading datasets, executing the pull-based operation DAG via the last OpContainer's next() method, saving results, and returning the total LLM API cost.

Usage

Use DSLRunner when you need to run a DocETL pipeline from a parsed YAML configuration dict. In CLI mode, docetl run creates a DSLRunner and calls load_run_save(). In the Python API, Pipeline.run() delegates to DSLRunner internally.

Code Reference

Source Location

Repository: docetl
File: docetl/runner.py
Lines: L106-494

Signature

class DSLRunner:
    def __init__(self, config: dict, max_threads: int | None = None, **kwargs):
        """
        Args:
            config: Parsed YAML pipeline configuration dict.
            max_threads: Maximum parallel execution threads.
        """

    def load_run_save(self) -> float:
        """Execute the entire pipeline. Returns total LLM API cost."""

    def syntax_check(self):
        """Validate all operations before execution."""

    def load(self) -> None:
        """Load all datasets defined in configuration."""

    def save(self, data: list[dict]) -> None:
        """Save final pipeline output to configured path."""

Import

from docetl.runner import DSLRunner

I/O Contract

Inputs

Name	Type	Required	Description
config	dict	Yes	Parsed YAML pipeline configuration
max_threads	int or None	No	Maximum parallel threads

Outputs

Name	Type	Description
load_run_save() returns	float	Total LLM API cost in dollars
output file	JSON or CSV	Pipeline results written to configured output path
intermediate files	JSON	Checkpoint files in intermediate_dir (if configured)

Usage Examples

CLI Execution

# Run a YAML pipeline
docetl run pipeline.yaml

Python API Execution

import yaml
from docetl.runner import DSLRunner

with open("pipeline.yaml", "r") as f:
    config = yaml.safe_load(f)

runner = DSLRunner(config, max_threads=4)
total_cost = runner.load_run_save()
print(f"Pipeline cost: ${total_cost:.2f}")

Related Pages

Implements Principle

Principle:Ucbepic_Docetl_Pipeline_Execution

Requires Environment

Environment:Ucbepic_Docetl_Python_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment