Implementation:Ucbepic Docetl MOAR SearchUtils
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Optimization, Search_Algorithms |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for utility functions supporting the MCTS search algorithm in the MOAR optimizer provided by DocETL.
Description
The search_utils module provides helper functions used by the MCTS algorithm but not core to the tree structure itself. It includes functions for token counting and history trimming (count_tokens, trim_history), UCB1 score calculation (calculate_ucb1), node exploration checks (is_fully_explored), search termination conditions (should_continue_search), pipeline configuration updates (update_pipeline), tree visualization (print_tree_visits_and_values, log_tree_to_file), and LLM prompt construction for expansion (create_expansion_prompt_acc). It also defines the ExpandResponseFormat Pydantic model for structured LLM responses.
Usage
Use these utilities when implementing or extending the MOAR MCTS search loop. They are called internally during selection, expansion, simulation, and logging phases.
Code Reference
Source Location
- Repository: Ucbepic_Docetl
- File: docetl/moar/search_utils.py
- Lines: 1-485
Signature
class ExpandResponseFormat(BaseModel):
directive: str
operators: List[str]
def count_tokens(messages) -> int: ...
def trim_history(history: list, keep_system_first: bool = True) -> list: ...
def get_directive_group(directive_name: str) -> str: ...
def get_excluded_directives_for_operation(node, op_name: str) -> set: ...
def is_action_applicable(node, action: Directive) -> bool: ...
def update_pipeline(orig_config, new_ops_list, target_ops) -> dict: ...
def fix_models(parsed_yaml): ...
def is_fully_explored(node, max_children_multiplier: float = 1.0) -> bool: ...
def should_continue_search(iteration: int, max_iterations: int, start_time: float, max_time: Optional[float] = None) -> bool: ...
def calculate_ucb1(node, parent_visits: int, exploration_constant: float = math.sqrt(2)) -> float: ...
def print_tree_visits_and_values(node=None, depth=0, file_handle=None, console=None): ...
def log_tree_to_file(root_node, iteration_num, output_dir="./outputs", console=None): ...
def create_expansion_prompt_acc(node, action_options, input_query, available_actions, action_cost_changes, action_accuracy_changes, action_counts, sample_input, root_node, yaml_file_path, dataset=None, node_accuracies=None, model_stats=None, available_models=None) -> tuple[str, str]: ...
Import
from docetl.moar.search_utils import (
calculate_ucb1,
is_fully_explored,
should_continue_search,
trim_history,
create_expansion_prompt_acc,
ExpandResponseFormat,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| node | Node | Yes | MCTS node for UCB calculation, exploration checks, or prompt construction |
| parent_visits | int | Yes | Visit count of the parent node (for UCB1) |
| exploration_constant | float | No | UCB exploration weight (default: sqrt(2)) |
| max_iterations | int | Yes | Maximum search iterations (for termination check) |
| max_time | Optional[float] | No | Maximum wall-clock time in seconds |
| messages | list | Yes | List of message dicts for token counting |
Outputs
| Name | Type | Description |
|---|---|---|
| ucb_value | float | Computed UCB1 score combining exploitation and exploration |
| should_continue | bool | Whether the MCTS search should continue iterating |
| is_explored | bool | Whether all children of a node have been visited |
| system_prompt, user_prompt | tuple[str, str] | Formatted prompts for LLM expansion calls |
Usage Examples
from docetl.moar.search_utils import calculate_ucb1, should_continue_search, is_fully_explored
import time
# Calculate UCB1 value for child selection
ucb = calculate_ucb1(child_node, parent_visits=100, exploration_constant=1.414)
# Check if search should continue
start = time.time()
if should_continue_search(iteration=25, max_iterations=50, start_time=start, max_time=3600):
# Continue MCTS iteration
pass
# Check if a node is fully explored
if is_fully_explored(node, max_children_multiplier=1.0):
# All children have been simulated at least once
pass