Principle:SqueezeAILab ETS Tree Node Expansion
| Knowledge Sources | |
|---|---|
| Domains | Tree_Search, Inference_Time_Compute |
| Last Updated | 2026-02-14 02:00 GMT |
Overview
A tree expansion mechanism that generates multiple child reasoning steps from a parent node by forking the KV cache state and scoring each new step with a Process Reward Model.
Description
Tree node expansion is the growth operation that produces new candidate reasoning steps. Given a selected parent node and a width allocation, the expansion:
- Forks the parent's KV cache state into multiple copies (one per child)
- Sets the PRM as the score backend for each forked state
- Generates the next reasoning step using the policy model with configured stopping conditions
- Scores each generated step by performing a PRM forward pass (zero-token generation extracting logits at specific positions)
The stopping condition and scoring mechanism are model-type-specific:
- llemma/mistral: Stop at "ки" (Cyrillic step delimiter), score via logits at specific token IDs
- llama: Stop at "\n\n", score via logits at position-indexed token IDs
After generation, each forked state is paired with its parent node and added to a running list for subsequent insertion into the tree.
Usage
This expansion occurs at every depth level of the tree search, for every node that is selected for expansion. It is the primary compute-consuming operation as it invokes the policy model for text generation and the reward model for scoring.
Theoretical Basis
Tree expansion in the context of KV cache-aware search is designed to maximize cache reuse:
# Abstract expansion procedure
def expand(parent_node, width):
forked_states = parent_node.kv_state.fork(width) # Share prefix KV cache
for state in forked_states:
state.score_backend = reward_model
state += generate(stop="step_delimiter") # Generate next step
state += score(forward_only=True) # Score via PRM
add_to_tree(state, parent_node)
The KV cache forking operation is efficient because all children share the same prefix computation. Only the newly generated tokens require fresh KV entries, making tree search more memory-efficient than independent sampling of full trajectories.