Implementation:Wandb Weave Auto Summarize

Knowledge Sources	Wandb Weave Weave Docs
Domains	Evaluation, Statistics
Last Updated	2026-02-14 00:00 GMT

Overview

Concrete tool for computing aggregate evaluation statistics and organizing leaderboard comparisons provided by the Wandb Weave library.

Description

auto_summarize() takes a list of score dictionaries (one per evaluated example) and computes aggregate statistics. For numeric values, it computes mean and standard error. For boolean values, it computes true_count and true_fraction. Nested dictionaries are processed recursively.

Leaderboard and get_leaderboard_results() organize results from multiple evaluation runs into a structured comparison across models and scoring columns.

Usage

auto_summarize is called automatically by Scorer.summarize() during evaluation. Use Leaderboard when comparing evaluation results across multiple models.

Code Reference

Source Location

Repository: wandb/weave
File: weave/flow/scorer.py (auto_summarize)
Lines: L134-186
File: weave/flow/leaderboard.py (Leaderboard)
Lines: L13-95

Signature

def auto_summarize(data: list) -> dict[str, Any] | None:
    """Automatically summarize a list of (potentially nested) dicts.

    Computes:
        - avg for numeric cols
        - count and fraction for boolean cols
        - other col types are ignored

    Returns:
        dict of summary stats, with structure matching input dict structure.
    """

def get_leaderboard_results(
    spec: Leaderboard, client: WeaveClient
) -> list[LeaderboardModelResult]:
    """Get leaderboard results for a Leaderboard spec and WeaveClient."""

Import

from weave.flow.scorer import auto_summarize
from weave.flow.leaderboard import Leaderboard, get_leaderboard_results

I/O Contract

Inputs (auto_summarize)

Name	Type	Required	Description
data	list	Yes	List of score dicts from all evaluated examples

Outputs (auto_summarize)

Name	Type	Description
return	None	Summary stats: mean/stderr for numerics, true_count/true_fraction for booleans

Inputs (get_leaderboard_results)

Name	Type	Required	Description
spec	Leaderboard	Yes	Leaderboard configuration with columns
client	WeaveClient	Yes	Authenticated Weave client

Outputs (get_leaderboard_results)

Name	Type	Description
return	list[LeaderboardModelResult]	Per-model results with column scores

Usage Examples

Auto Summarize

from weave.flow.scorer import auto_summarize

scores = [
    {"accuracy": True, "latency": 0.5},
    {"accuracy": False, "latency": 0.3},
    {"accuracy": True, "latency": 0.4},
]

summary = auto_summarize(scores)
# {
#   "accuracy": {"true_count": 2, "true_fraction": 0.667},
#   "latency": {"mean": 0.4, "stderr": 0.058}
# }

Related Pages

Implements Principle

Principle:Wandb_Weave_Result_Analysis

Requires Environment

Environment:Wandb_Weave_Python_SDK_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment