Implementation:Ucbepic Docetl ExperimentRunMoar

Knowledge Sources	Ucbepic_Docetl
Domains	Data_Processing, Optimization, Experimentation
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for running MOAR (Multi-Objective Automated Rewriting) optimization experiments on DocETL pipelines provided by DocETL.

Description

The run_moar module provides the experiment runner for MOAR-based pipeline optimization. It includes the main run_moar_experiment function that initializes MOARSearch with a root YAML pipeline, configures available directives and models, runs the MCTS search loop, and evaluates results using dataset-specific metrics. The module also provides Modal integration for remote execution via run_moar_remote and modal_main_moar, along with YAML rewriting helpers for volume-mounted paths. It supports custom evaluation functions and metric keys for extensible accuracy measurement.

Usage

Use this module to run MOAR optimization experiments either locally or on Modal. It is the primary entry point for benchmarking the MOAR optimizer against various datasets (CUAD, BlackVault, Game Reviews, etc.).

Code Reference

Source Location

Repository: Ucbepic_Docetl
File: experiments/reasoning/run_moar.py
Lines: 1-422

Signature

def _resolve_in_volume(path: str | None) -> str | None: ...

def _rewrite_pipeline_yaml_for_modal(orig_yaml_path: str, experiment_name: str) -> str: ...

@app.function(...)
def run_moar_remote(
    yaml_path: str,
    dataset_path: str,
    data_dir: str | None = None,
    output_dir: str | None = None,
    experiment_name: str = "moar_experiment",
    max_iterations: int = 40,
    exploration_weight: float = 1.414,
    model: str = DEFAULT_MODEL,
    dataset: str = "cuad",
    ground_truth_path: str | None = None,
    original_query_result: Dict[str, Any] | None = None,
    build_first_layer: Optional[bool] = False,
    available_models: List[str] | None = None,
    accuracy_function: str | None = None,
    accuracy_metric_key: str | None = None,
): ...

def run_moar_experiment(
    yaml_path: str,
    dataset_path: str,
    data_dir: str = None,
    output_dir: str = None,
    experiment_name: str = "moar_experiment",
    max_iterations: int = 40,
    exploration_weight: float = 1.414,
    model: str = DEFAULT_MODEL,
    dataset: str = "cuad",
    ground_truth_path: str | None = None,
    original_query_result: Dict[str, Any] | None = None,
    build_first_layer: Optional[bool] = False,
    available_models: List[str] | None = None,
    accuracy_function: str | None = None,
    accuracy_metric_key: str | None = None,
): ...

@app.local_entrypoint()
def modal_main_moar(...): ...

Import

from experiments.reasoning.run_moar import run_moar_experiment

I/O Contract

Inputs

Name	Type	Required	Description
yaml_path	str	Yes	Path to the input YAML pipeline file
dataset_path	str	Yes	Path to the dataset file for sample input data
dataset	str	No	Dataset name for evaluation (default: "cuad")
max_iterations	int	No	Maximum MCTS search iterations (default: 40)
exploration_weight	float	No	UCB exploration parameter (default: 1.414)
model	str	No	LLM model for directive instantiation (default: DEFAULT_MODEL)
output_dir	str or None	No	Directory to save experiment outputs
experiment_name	str	No	Name for this experiment run (default: "moar_experiment")
available_models	List[str] or None	No	List of available models for operators
accuracy_function	str or None	No	Path to custom Python evaluation function file
accuracy_metric_key	str or None	No	Key to extract from evaluation results for accuracy
build_first_layer	bool	No	Whether to build the first layer of the search tree

Outputs

Name	Type	Description
results	dict	Experiment summary with best nodes, costs, eval results, Pareto AUC, and timings
eval_results	list[dict]	Per-node evaluation results with metrics and frontier status
pareto_auc	float	Area under the Pareto frontier curve

Usage Examples

from experiments.reasoning.run_moar import run_moar_experiment

# Run a MOAR optimization experiment
results = run_moar_experiment(
    yaml_path="experiments/reasoning/configs/cuad_pipeline.yaml",
    dataset_path="experiments/reasoning/data/train/cuad.json",
    dataset="cuad",
    max_iterations=40,
    exploration_weight=1.414,
    experiment_name="cuad_moar_v1",
    output_dir="outputs/experiments",
)

print(f"Completed in {results['duration_seconds']:.1f}s")
print(f"Pareto AUC: {results['pareto_auc']:.4f}")

Related Pages

Environment:Ucbepic_Docetl_Python_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment