Principle:SqueezeAILab ETS ETS Tree Search

Knowledge Sources	ETS REBASE
Domains	Tree_Search, Inference_Time_Compute, Mathematical_Reasoning
Last Updated	2026-02-14 02:00 GMT

Overview

A reward-guided tree search algorithm that iteratively expands and selects reasoning paths to solve mathematical problems using a fixed compute budget.

Description

The ETS (Efficient Tree Search) algorithm performs inference-time compute scaling by building a search tree of reasoning trajectories. Starting from a problem prompt, it iteratively:

Selects promising nodes at the current depth based on PRM scores and the chosen selection strategy
Expands selected nodes by generating the next reasoning step via the policy model
Inserts new nodes into the tree with their PRM scores
Repeats until the search budget (width) is exhausted or all trajectories terminate

The algorithm maintains a fixed "width" budget representing the total number of candidate trajectories. At each depth level, the selection strategy decides how to allocate this budget across existing nodes. The search terminates when all budget is consumed (nodes either reach an answer or exhaust their token limit).

This function is decorated with SGLang's @function decorator, enabling batched multi-threaded execution via run_batch where multiple questions are processed in parallel.

Usage

Use this as the main entry point for running ETS tree search on a set of math problems. It requires running policy and reward model servers, a loaded configuration, and optionally a SentenceTransformer embedding model for diversity scoring.

Theoretical Basis

Reward-guided tree search is an instance of best-first search where node expansion is guided by a learned value function (the PRM). The key algorithmic loop is:

# Abstract tree search algorithm
tree = initialize_tree(root_prompt)
depth = 0
while budget_remaining(tree):
    candidates = get_nodes_at_depth(tree, depth)
    selected, widths = select_strategy(candidates, scores, remaining_budget)
    for node, width in zip(selected, widths):
        new_nodes = expand(node, width)  # generate next steps
        for new_node in new_nodes:
            score = reward_model.score(new_node)
            insert(tree, new_node, score)
    depth += 1

The search differs from standard beam search in two ways:

Variable-width expansion: Each selected node can be expanded with a different number of children, allocated proportionally to its score
Budget accounting: Leaf nodes and token-exhausted nodes consume budget without further expansion, naturally terminating unpromising paths

The maximum search depth is capped at 25 to prevent infinite loops.

Related Pages

Implemented By

Implementation:SqueezeAILab_ETS_Reward_Guided_Search

Uses Heuristic

Heuristic:SqueezeAILab_ETS_Max_Depth_And_Token_Guards

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment