Implementation:SqueezeAILab ETS Score Aggregation Functions
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Statistical_Aggregation |
| Last Updated | 2026-02-14 02:00 GMT |
Overview
Concrete tool for aggregating per-step PRM scores into trajectory-level scores, provided by math_evaluate.py.
Description
Four pure Python functions that take a list of step scores and return a single float. Each handles edge cases (empty lists, None values) by returning 0.
Usage
Used as the aggfunc parameter in evaluate() for best-of-n selection, or as weight_func in majority_vote() for weighted voting.
Code Reference
Source Location
- Repository: ETS
- File: math_evaluate.py
- Lines: 61-89
Signature
def agg_min(step_scores):
"""Return minimum score, handling empty lists and None values.
Args:
step_scores (list[float]): Per-step PRM scores
Returns:
float: Minimum score, or 0 if empty
"""
def agg_mean(step_scores):
"""Return arithmetic mean of step scores.
Args:
step_scores (list[float]): Per-step PRM scores
Returns:
float: Mean score, or 0 if empty
"""
def agg_prod(step_scores):
"""Return product of all step scores.
Args:
step_scores (list[float]): Per-step PRM scores
Returns:
float: Product of scores, or 0 if empty
"""
def agg_last(step_scores):
"""Return the last step score.
Args:
step_scores (list[float]): Per-step PRM scores
Returns:
float: Last score, or 0 if empty or last is None
"""
Import
# Defined in math_evaluate.py (not a package import)
# When used internally:
from math_evaluate import agg_min, agg_mean, agg_prod, agg_last
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| step_scores | list[float] | Yes | Per-step PRM scores from candidate's "step_scores" field |
Outputs
| Name | Type | Description |
|---|---|---|
| score | float | Aggregated trajectory-level score (0 for empty inputs) |
Usage Examples
Direct Usage
scores = [0.95, 0.87, 0.92, 0.88]
print(agg_min(scores)) # 0.87 — weakest step
print(agg_mean(scores)) # 0.905 — average quality
print(agg_prod(scores)) # 0.687 — joint probability
print(agg_last(scores)) # 0.88 — final step score
Edge Cases
# Empty list
print(agg_min([])) # 0
print(agg_mean([])) # 0
print(agg_prod([])) # 0
print(agg_last([])) # 0
# None handling in agg_min
print(agg_min([0.9, None, 0.8])) # 0.8 (skips None)
# None handling in agg_last
print(agg_last([0.9, 0.8, None])) # 0 (last is None)
Used in Evaluation
# Best-of-n evaluation with agg_last
accuracy = evaluate(
path="exp_results/ets_16_math500/answers.json",
aggfunc=agg_last,
extract_function=extract_shepherd_answer,
)
# Weighted majority voting
accuracy = majority_vote(
path="exp_results/ets_16_math500/answers.json",
weighted=True,
weight_func=agg_last,
extract_function=extract_shepherd_answer,
)
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment