Implementation:Ucbepic Docetl MOARSearch Search
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Search_Algorithms |
| Last Updated | 2026-02-08 01:40 GMT |
Overview
Concrete MCTS search implementation for multi-objective pipeline optimization provided by the MOAR module.
Description
MOARSearch implements Monte Carlo Tree Search over pipeline rewrite directives. It runs concurrent search agents that iterate through selection, expansion, simulation, and backpropagation phases. Each iteration applies a rewrite directive to generate a new YAML pipeline, executes it on a sample dataset, evaluates accuracy, and updates the Pareto frontier.
The search tree uses Node objects representing pipeline variants, and ParetoFrontier to track cost-accuracy tradeoffs using hypervolume indicator calculations.
Usage
MOARSearch is instantiated and called by run_moar_optimization() during docetl build --optimizer moar. It requires a root YAML, available directives, sample data, evaluation function, and search budget.
Code Reference
Source Location
- Repository: docetl
- File: docetl/moar/MOARSearch.py
- Lines: L39-481
Signature
class MOARSearch:
def __init__(
self,
root_yaml_path: str,
available_actions: set[Directive],
sample_input,
dataset_stats: str,
dataset_name: str,
available_models: List[str],
evaluate_func: Callable,
exploration_constant: float = 1.414,
max_iterations: int = 20,
model: str = "gpt-5",
output_dir: Optional[str] = None,
build_first_layer: Optional[bool] = True,
custom_metric_key: Optional[str] = None,
sample_dataset_path: Optional[str] = None,
):
"""Initialize MCTS search with pipeline and search parameters."""
def search(self) -> List[Node]:
"""Perform MCTS search. Returns Pareto frontier nodes."""
def search_iteration(self) -> bool:
"""Perform one complete MCTS iteration (select, expand, simulate, backprop)."""
Import
from docetl.moar.MOARSearch import MOARSearch
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| root_yaml_path | str | Yes | Path to baseline pipeline YAML |
| available_actions | set[Directive] | Yes | Set of 25+ rewrite directives |
| sample_input | list[dict] | Yes | Sample dataset for simulation |
| evaluate_func | Callable | Yes | Scoring function from @register_eval |
| max_iterations | int | No | MCTS budget (default 20) |
| exploration_constant | float | No | UCB exploration weight (default 1.414) |
| available_models | List[str] | Yes | LLM models to explore |
Outputs
| Name | Type | Description |
|---|---|---|
| search() returns | List[Node] | Pareto frontier nodes (best cost-accuracy tradeoffs) |
| YAML files | files | Optimized pipeline configs written to output_dir |
| Pareto plots | PNG files | Cost vs accuracy scatter plots |
Usage Examples
from docetl.moar.MOARSearch import MOARSearch
from docetl.reasoning_optimizer.directives import ALL_DIRECTIVES
search = MOARSearch(
root_yaml_path="pipeline.yaml",
available_actions=ALL_DIRECTIVES,
sample_input=sample_data,
dataset_stats="100 documents, avg 500 tokens",
dataset_name="legal_docs",
available_models=["gpt-4o", "gpt-4o-mini"],
evaluate_func=wrapped_eval,
max_iterations=30,
exploration_constant=1.414,
)
frontier_plans = search.search()
for plan in frontier_plans:
print(f"Plan {plan.id}: cost=${plan.cost:.2f}, accuracy={plan.accuracy:.3f}")