Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:SqueezeAILab ETS ILP Node Selection

From Leeroopedia
Knowledge Sources
Domains Tree_Search, Optimization, Integer_Linear_Programming
Last Updated 2026-02-14 02:00 GMT

Overview

An Integer Linear Programming (ILP) formulation that selects which tree nodes to retain and expand, jointly optimizing for outcome quality, KV cache cost, and trajectory diversity.

Description

The ILP Node Selection method (called "softmax_costmodel" in the codebase) is the core algorithmic contribution of the ETS paper. It formulates the node selection problem at each tree depth as a binary optimization problem:

  • Decision variables: Binary variables for each leaf node (retain or prune) and each branch node (active or inactive)
  • Objective: Maximize a weighted combination of outcome scores, minimize cost (number of active nodes), and maximize cluster coverage for diversity
  • Constraints: Branch nodes are active if and only if they have at least one active leaf descendant; at least one node must be retained

After the ILP is solved, the retained nodes are allocated expansion widths via softmax-weighted proportional allocation based on their PRM scores.

Usage

This selection strategy is activated when select_method: "softmax_costmodel" is set in the YAML configuration. It is the recommended and default strategy for ETS experiments, providing the best trade-off between accuracy and compute efficiency.

Theoretical Basis

The ILP formulation defines:

maxx,y[i=1NOixi+λcj=1MCjyj+λci=1NCi+Mxi+λsk=1Kcoveragek]

Where:

  • xi{0,1} — binary decision for leaf node i (N leaves total)
  • yj{0,1} — binary decision for branch node j (M branches total)
  • Oi — normalized outcome score for leaf i (softmax-weighted based on PRM scores)
  • Cj=1M+N — cost coefficient (uniform across nodes)
  • λc — cost penalty weight (negated to penalize retaining nodes, encouraging KV cache sharing)
  • λs — diversity penalty weight (scaled by 1/K clusters)
  • coveragek — binary indicator: 1 if at least one leaf in cluster k is selected

Outcome scores are computed by sorting nodes by PRM score, applying softmax with temperature T, then computing proportional width allocations. This allocation is converted to a normalized outcome score per node.

Diversity enforcement uses hierarchical clustering of sentence embeddings (cosine distance, average linkage, threshold 0.05) to group similar trajectories. The coverage term rewards selecting at least one representative from each cluster.

Constraints:

  • Branch activation: yjileaves(j)xi and yjxiileaves(j)
  • Non-empty: i=1Nxi1

Post-ILP allocation: After solving, retained nodes receive softmax-proportional width allocation for the next expansion round.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment