Principle:SqueezeAILab ETS ETS Tree Search
| Knowledge Sources | |
|---|---|
| Domains | Tree_Search, Inference_Time_Compute, Mathematical_Reasoning |
| Last Updated | 2026-02-14 02:00 GMT |
Overview
A reward-guided tree search algorithm that iteratively expands and selects reasoning paths to solve mathematical problems using a fixed compute budget.
Description
The ETS (Efficient Tree Search) algorithm performs inference-time compute scaling by building a search tree of reasoning trajectories. Starting from a problem prompt, it iteratively:
- Selects promising nodes at the current depth based on PRM scores and the chosen selection strategy
- Expands selected nodes by generating the next reasoning step via the policy model
- Inserts new nodes into the tree with their PRM scores
- Repeats until the search budget (width) is exhausted or all trajectories terminate
The algorithm maintains a fixed "width" budget representing the total number of candidate trajectories. At each depth level, the selection strategy decides how to allocate this budget across existing nodes. The search terminates when all budget is consumed (nodes either reach an answer or exhaust their token limit).
This function is decorated with SGLang's @function decorator, enabling batched multi-threaded execution via run_batch where multiple questions are processed in parallel.
Usage
Use this as the main entry point for running ETS tree search on a set of math problems. It requires running policy and reward model servers, a loaded configuration, and optionally a SentenceTransformer embedding model for diversity scoring.
Theoretical Basis
Reward-guided tree search is an instance of best-first search where node expansion is guided by a learned value function (the PRM). The key algorithmic loop is:
# Abstract tree search algorithm
tree = initialize_tree(root_prompt)
depth = 0
while budget_remaining(tree):
candidates = get_nodes_at_depth(tree, depth)
selected, widths = select_strategy(candidates, scores, remaining_budget)
for node, width in zip(selected, widths):
new_nodes = expand(node, width) # generate next steps
for new_node in new_nodes:
score = reward_model.score(new_node)
insert(tree, new_node, score)
depth += 1
The search differs from standard beam search in two ways:
- Variable-width expansion: Each selected node can be expanded with a different number of children, allocated proportionally to its score
- Budget accounting: Leaf nodes and token-exhausted nodes consume budget without further expansion, naturally terminating unpromising paths
The maximum search depth is capped at 25 to prevent infinite loops.