Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:SqueezeAILab ETS Score Aggregation Functions

From Leeroopedia
Knowledge Sources
Domains Evaluation, Statistical_Aggregation
Last Updated 2026-02-14 02:00 GMT

Overview

Concrete tool for aggregating per-step PRM scores into trajectory-level scores, provided by math_evaluate.py.

Description

Four pure Python functions that take a list of step scores and return a single float. Each handles edge cases (empty lists, None values) by returning 0.

Usage

Used as the aggfunc parameter in evaluate() for best-of-n selection, or as weight_func in majority_vote() for weighted voting.

Code Reference

Source Location

  • Repository: ETS
  • File: math_evaluate.py
  • Lines: 61-89

Signature

def agg_min(step_scores):
    """Return minimum score, handling empty lists and None values.
    Args:
        step_scores (list[float]): Per-step PRM scores
    Returns:
        float: Minimum score, or 0 if empty
    """

def agg_mean(step_scores):
    """Return arithmetic mean of step scores.
    Args:
        step_scores (list[float]): Per-step PRM scores
    Returns:
        float: Mean score, or 0 if empty
    """

def agg_prod(step_scores):
    """Return product of all step scores.
    Args:
        step_scores (list[float]): Per-step PRM scores
    Returns:
        float: Product of scores, or 0 if empty
    """

def agg_last(step_scores):
    """Return the last step score.
    Args:
        step_scores (list[float]): Per-step PRM scores
    Returns:
        float: Last score, or 0 if empty or last is None
    """

Import

# Defined in math_evaluate.py (not a package import)
# When used internally:
from math_evaluate import agg_min, agg_mean, agg_prod, agg_last

I/O Contract

Inputs

Name Type Required Description
step_scores list[float] Yes Per-step PRM scores from candidate's "step_scores" field

Outputs

Name Type Description
score float Aggregated trajectory-level score (0 for empty inputs)

Usage Examples

Direct Usage

scores = [0.95, 0.87, 0.92, 0.88]

print(agg_min(scores))   # 0.87 — weakest step
print(agg_mean(scores))  # 0.905 — average quality
print(agg_prod(scores))  # 0.687 — joint probability
print(agg_last(scores))  # 0.88 — final step score

Edge Cases

# Empty list
print(agg_min([]))    # 0
print(agg_mean([]))   # 0
print(agg_prod([]))   # 0
print(agg_last([]))   # 0

# None handling in agg_min
print(agg_min([0.9, None, 0.8]))  # 0.8 (skips None)

# None handling in agg_last
print(agg_last([0.9, 0.8, None]))  # 0 (last is None)

Used in Evaluation

# Best-of-n evaluation with agg_last
accuracy = evaluate(
    path="exp_results/ets_16_math500/answers.json",
    aggfunc=agg_last,
    extract_function=extract_shepherd_answer,
)

# Weighted majority voting
accuracy = majority_vote(
    path="exp_results/ets_16_math500/answers.json",
    weighted=True,
    weight_func=agg_last,
    extract_function=extract_shepherd_answer,
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment