Principle:SqueezeAILab ETS ILP Node Selection

Knowledge Sources	ETS REBASE
Domains	Tree_Search, Optimization, Integer_Linear_Programming
Last Updated	2026-02-14 02:00 GMT

Overview

An Integer Linear Programming (ILP) formulation that selects which tree nodes to retain and expand, jointly optimizing for outcome quality, KV cache cost, and trajectory diversity.

Description

The ILP Node Selection method (called "softmax_costmodel" in the codebase) is the core algorithmic contribution of the ETS paper. It formulates the node selection problem at each tree depth as a binary optimization problem:

Decision variables: Binary variables for each leaf node (retain or prune) and each branch node (active or inactive)
Objective: Maximize a weighted combination of outcome scores, minimize cost (number of active nodes), and maximize cluster coverage for diversity
Constraints: Branch nodes are active if and only if they have at least one active leaf descendant; at least one node must be retained

After the ILP is solved, the retained nodes are allocated expansion widths via softmax-weighted proportional allocation based on their PRM scores.

Usage

This selection strategy is activated when select_method: "softmax_costmodel" is set in the YAML configuration. It is the recommended and default strategy for ETS experiments, providing the best trade-off between accuracy and compute efficiency.

Theoretical Basis

The ILP formulation defines:

$\max_{x, y} [\sum_{i = 1}^{N} O_{i} \cdot x_{i} + λ_{c} \sum_{j = 1}^{M} C_{j} \cdot y_{j} + λ_{c} \sum_{i = 1}^{N} C_{i + M} \cdot x_{i} + λ_{s} \sum_{k = 1}^{K} {coverage}_{k}]$

Where:

$x_{i} \in {0, 1}$ — binary decision for leaf node i (N leaves total)
$y_{j} \in {0, 1}$ — binary decision for branch node j (M branches total)
$O_{i}$ — normalized outcome score for leaf i (softmax-weighted based on PRM scores)
$C_{j} = \frac{1}{M + N}$ — cost coefficient (uniform across nodes)
$λ_{c}$ — cost penalty weight (negated to penalize retaining nodes, encouraging KV cache sharing)
$λ_{s}$ — diversity penalty weight (scaled by 1/K clusters)
${coverage}_{k}$ — binary indicator: 1 if at least one leaf in cluster k is selected

Outcome scores are computed by sorting nodes by PRM score, applying softmax with temperature T, then computing proportional width allocations. This allocation is converted to a normalized outcome score per node.

Diversity enforcement uses hierarchical clustering of sentence embeddings (cosine distance, average linkage, threshold 0.05) to group similar trajectories. The coverage term rewards selecting at least one representative from each cluster.

Constraints:

Branch activation: $y_{j} \leq \sum_{i \in leaves (j)} x_{i}$ and $y_{j} \geq x_{i} \forall i \in leaves (j)$
Non-empty: $\sum_{i = 1}^{N} x_{i} \geq 1$

Post-ILP allocation: After solving, retained nodes receive softmax-proportional width allocation for the next expansion round.

Related Pages

Implemented By

Implementation:SqueezeAILab_ETS_Tree_Select_Softmax_Costmodel

Uses Heuristic

Heuristic:SqueezeAILab_ETS_Lambda_Similarity_Scaling

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment