Heuristic:SqueezeAILab ETS Softmax Temperature Tuning
| Knowledge Sources | |
|---|---|
| Domains | Optimization, LLMs |
| Last Updated | 2026-02-14 02:30 GMT |
Overview
Use two separate temperature parameters — a sampling temperature of 1.0 for diverse generation and a softmax temperature of 0.2 for sharpened reward-based node selection — to balance exploration and exploitation in tree search.
Description
The ETS system uses two distinct temperature hyperparameters that serve different purposes. The sampling temperature (`temperature: 1.0`) controls the randomness of the policy model's text generation during tree expansion — a value of 1.0 ensures diverse candidate trajectories. The softmax temperature (`softmax_temperature: 0.2`) controls how sharply the reward scores are converted into expansion width allocations during node selection — a low value like 0.2 makes the distribution peaked, strongly favoring high-reward nodes while still giving some budget to lower-ranked ones.
Usage
Apply this heuristic when configuring ETS hyperparameters for the `softmax_costmodel` or `softmax` selection methods. The default values (`temperature=1.0`, `softmax_temperature=0.2`) are tuned for math reasoning tasks (MATH500). Adjusting the softmax temperature changes the exploration-exploitation trade-off: lower values concentrate resources on the best nodes, higher values spread budget more evenly.
The Insight (Rule of Thumb)
- Action: Set `temperature: 1.0` for sampling and `softmax_temperature: 0.2` for node selection in the YAML config.
- Value: Sampling temp = 1.0 (no sharpening), softmax temp = 0.2 (5x sharpening).
- Trade-off: Lower softmax temperature concentrates more expansion budget on top-scoring nodes (higher exploitation), potentially missing correct trajectories through lower-ranked nodes. Higher softmax temperature spreads budget more evenly (higher exploration) but wastes compute on unpromising branches.
Reasoning
The softmax temperature appears in the width allocation formula: `exp(score / T)` is computed for each candidate node, then the remaining width budget is allocated proportionally. With `T=0.2`, a score difference of 0.1 between two nodes results in a `exp(0.5)` ≈ 1.65x ratio in allocated width, creating meaningful differentiation. With `T=1.0`, the same difference yields only `exp(0.1)` ≈ 1.11x, making allocation nearly uniform.
The sampling temperature is kept at 1.0 (unmodified logits) to maximize diversity of generated reasoning steps. Lowering it would produce more similar candidates at each expansion, defeating the purpose of tree search.
All three provided YAML configs use the same values (`temperature: 1.0`, `softmax_temperature: 0.2`), suggesting these are well-tested defaults.
Code Evidence
Softmax temperature in width allocation from `rebase.py:432-444`:
T = self.paras["softmax_temperature"]
exp_weights = torch.exp(weights / T)
sum_exp_weights = exp_weights.sum()
outcome_score = []
width_tmp = width
for weight in exp_weights:
if sum_exp_weights > 0:
num = int(math.ceil(width_tmp * weight / sum_exp_weights))
outcome_score.append(num)
width_tmp -= num
sum_exp_weights -= weight
else:
outcome_score.append(0)
Sampling temperature in tree expansion from `rebase.py:167`:
fork += gen("step", self.paras["max_step_tokens"], stop="ки", temperature=self.paras["temperature"])
YAML configuration from `hype-parameters/ets_16_math500.yaml:1,6`:
temperature: 1.0 # sample temperature
softmax_temperature: 0.2 # temperature in the softmax