Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:SqueezeAILab ETS ETS Tree Search

From Leeroopedia
Knowledge Sources
Domains Tree_Search, Inference_Time_Compute, Mathematical_Reasoning
Last Updated 2026-02-14 02:00 GMT

Overview

A reward-guided tree search algorithm that iteratively expands and selects reasoning paths to solve mathematical problems using a fixed compute budget.

Description

The ETS (Efficient Tree Search) algorithm performs inference-time compute scaling by building a search tree of reasoning trajectories. Starting from a problem prompt, it iteratively:

  1. Selects promising nodes at the current depth based on PRM scores and the chosen selection strategy
  2. Expands selected nodes by generating the next reasoning step via the policy model
  3. Inserts new nodes into the tree with their PRM scores
  4. Repeats until the search budget (width) is exhausted or all trajectories terminate

The algorithm maintains a fixed "width" budget representing the total number of candidate trajectories. At each depth level, the selection strategy decides how to allocate this budget across existing nodes. The search terminates when all budget is consumed (nodes either reach an answer or exhaust their token limit).

This function is decorated with SGLang's @function decorator, enabling batched multi-threaded execution via run_batch where multiple questions are processed in parallel.

Usage

Use this as the main entry point for running ETS tree search on a set of math problems. It requires running policy and reward model servers, a loaded configuration, and optionally a SentenceTransformer embedding model for diversity scoring.

Theoretical Basis

Reward-guided tree search is an instance of best-first search where node expansion is guided by a learned value function (the PRM). The key algorithmic loop is:

# Abstract tree search algorithm
tree = initialize_tree(root_prompt)
depth = 0
while budget_remaining(tree):
    candidates = get_nodes_at_depth(tree, depth)
    selected, widths = select_strategy(candidates, scores, remaining_budget)
    for node, width in zip(selected, widths):
        new_nodes = expand(node, width)  # generate next steps
        for new_node in new_nodes:
            score = reward_model.score(new_node)
            insert(tree, new_node, score)
    depth += 1

The search differs from standard beam search in two ways:

  1. Variable-width expansion: Each selected node can be expanded with a different number of children, allocated proportionally to its score
  2. Budget accounting: Leaf nodes and token-exhausted nodes consume budget without further expansion, naturally terminating unpromising paths

The maximum search depth is capped at 25 to prevent infinite loops.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment