Implementation:Ucbepic Docetl MOAR CliHelpers
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Optimization, CLI |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for CLI helper functions that launch and configure MOAR optimizer runs provided by DocETL.
Description
The cli_helpers module provides functions to infer dataset information from YAML pipeline configs, load custom evaluation functions, and orchestrate a complete MOAR optimization run from the command line. It handles parameter extraction from optimizer_config sections, resolves relative file paths, validates required configuration fields, and initializes MOARSearch with the correct parameters before running the search.
Usage
Use these helpers when running the MOAR optimizer from the CLI via docetl optimize or when programmatically launching MOAR optimization from a YAML pipeline configuration that includes an optimizer_config section.
Code Reference
Source Location
- Repository: Ucbepic_Docetl
- File: docetl/moar/cli_helpers.py
- Lines: 1-305
Signature
def infer_dataset_info(yaml_path: str, config: dict) -> tuple[str, str]: ...
def load_evaluation_function(config: dict, dataset_file_path: str) -> callable: ...
def run_moar_optimization(
yaml_path: str,
optimizer_config: dict,
) -> Dict[str, Any]: ...
Import
from docetl.moar.cli_helpers import infer_dataset_info, load_evaluation_function, run_moar_optimization
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| yaml_path | str | Yes | Path to the YAML pipeline configuration file |
| config | dict | Yes | Full YAML config dictionary (for infer_dataset_info) |
| optimizer_config | dict | Yes | Dictionary from the YAML optimizer_config section containing save_dir, available_models, evaluation_file, metric_key, max_iterations |
| dataset_file_path | str | Yes | Path to the dataset file (for load_evaluation_function) |
Outputs
| Name | Type | Description |
|---|---|---|
| dataset_path | str | Resolved absolute path to the dataset file |
| dataset_name | str | Name of the dataset from the YAML config |
| evaluate_func | callable | Loaded evaluation function decorated with @docetl.register_eval |
| experiment_summary | Dict[str, Any] | Summary dictionary with optimization results, costs, and timings |
Usage Examples
from docetl.moar.cli_helpers import run_moar_optimization
import yaml
# Load a pipeline YAML with optimizer_config
with open("my_pipeline.yaml", "r") as f:
config = yaml.safe_load(f)
optimizer_config = config.get("optimizer_config", {})
# Run MOAR optimization
results = run_moar_optimization(
yaml_path="my_pipeline.yaml",
optimizer_config=optimizer_config,
)
print(f"Best pipeline cost: {results['best_cost']}")