Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ucbepic Docetl DSLRunner Load Run Save

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Pipeline_Orchestration
Last Updated 2026-02-08 01:40 GMT

Overview

Concrete tool for executing complete DocETL pipelines provided by the DSLRunner class.

Description

The DSLRunner class is the central pipeline execution engine in DocETL. Its load_run_save() method orchestrates the full pipeline lifecycle: printing the query plan, loading datasets, executing the pull-based operation DAG via the last OpContainer's next() method, saving results, and returning the total LLM API cost.

Usage

Use DSLRunner when you need to run a DocETL pipeline from a parsed YAML configuration dict. In CLI mode, docetl run creates a DSLRunner and calls load_run_save(). In the Python API, Pipeline.run() delegates to DSLRunner internally.

Code Reference

Source Location

  • Repository: docetl
  • File: docetl/runner.py
  • Lines: L106-494

Signature

class DSLRunner:
    def __init__(self, config: dict, max_threads: int | None = None, **kwargs):
        """
        Args:
            config: Parsed YAML pipeline configuration dict.
            max_threads: Maximum parallel execution threads.
        """

    def load_run_save(self) -> float:
        """Execute the entire pipeline. Returns total LLM API cost."""

    def syntax_check(self):
        """Validate all operations before execution."""

    def load(self) -> None:
        """Load all datasets defined in configuration."""

    def save(self, data: list[dict]) -> None:
        """Save final pipeline output to configured path."""

Import

from docetl.runner import DSLRunner

I/O Contract

Inputs

Name Type Required Description
config dict Yes Parsed YAML pipeline configuration
max_threads int or None No Maximum parallel threads

Outputs

Name Type Description
load_run_save() returns float Total LLM API cost in dollars
output file JSON or CSV Pipeline results written to configured output path
intermediate files JSON Checkpoint files in intermediate_dir (if configured)

Usage Examples

CLI Execution

# Run a YAML pipeline
docetl run pipeline.yaml

Python API Execution

import yaml
from docetl.runner import DSLRunner

with open("pipeline.yaml", "r") as f:
    config = yaml.safe_load(f)

runner = DSLRunner(config, max_threads=4)
total_cost = runner.load_run_save()
print(f"Pipeline cost: ${total_cost:.2f}")

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment