Principle:SqueezeAILab ETS ILP Node Selection
| Knowledge Sources | |
|---|---|
| Domains | Tree_Search, Optimization, Integer_Linear_Programming |
| Last Updated | 2026-02-14 02:00 GMT |
Overview
An Integer Linear Programming (ILP) formulation that selects which tree nodes to retain and expand, jointly optimizing for outcome quality, KV cache cost, and trajectory diversity.
Description
The ILP Node Selection method (called "softmax_costmodel" in the codebase) is the core algorithmic contribution of the ETS paper. It formulates the node selection problem at each tree depth as a binary optimization problem:
- Decision variables: Binary variables for each leaf node (retain or prune) and each branch node (active or inactive)
- Objective: Maximize a weighted combination of outcome scores, minimize cost (number of active nodes), and maximize cluster coverage for diversity
- Constraints: Branch nodes are active if and only if they have at least one active leaf descendant; at least one node must be retained
After the ILP is solved, the retained nodes are allocated expansion widths via softmax-weighted proportional allocation based on their PRM scores.
Usage
This selection strategy is activated when select_method: "softmax_costmodel" is set in the YAML configuration. It is the recommended and default strategy for ETS experiments, providing the best trade-off between accuracy and compute efficiency.
Theoretical Basis
The ILP formulation defines:
Where:
- — binary decision for leaf node i (N leaves total)
- — binary decision for branch node j (M branches total)
- — normalized outcome score for leaf i (softmax-weighted based on PRM scores)
- — cost coefficient (uniform across nodes)
- — cost penalty weight (negated to penalize retaining nodes, encouraging KV cache sharing)
- — diversity penalty weight (scaled by 1/K clusters)
- — binary indicator: 1 if at least one leaf in cluster k is selected
Outcome scores are computed by sorting nodes by PRM score, applying softmax with temperature T, then computing proportional width allocations. This allocation is converted to a normalized outcome score per node.
Diversity enforcement uses hierarchical clustering of sentence embeddings (cosine distance, average linkage, threshold 0.05) to group similar trajectories. The coverage term rewards selecting at least one representative from each cluster.
Constraints:
- Branch activation: and
- Non-empty:
Post-ILP allocation: After solving, retained nodes receive softmax-proportional width allocation for the next expansion round.