Principle:Wandb Weave Scorer Definition

Knowledge Sources	Weave Docs Wandb Weave
Domains	Evaluation, Metrics
Last Updated	2026-02-14 00:00 GMT

Overview

A scoring abstraction that defines how model outputs are evaluated against ground truth or quality criteria.

Description

Scorer Definition provides a standard interface for writing evaluation metrics. A scorer receives model output and optionally ground truth data from the dataset, computes a score, and returns it. Scorers can be class-based (inheriting from Scorer) or function-based (decorated with @weave.op).

The scoring framework supports column mapping to bridge differences between dataset column names and scorer parameter names, and automatic summarization that computes aggregate statistics (mean, stderr for numerics; count, fraction for booleans).

Usage

Use this principle when defining how model predictions should be graded. Scorers are composed with datasets and models in the Evaluation pipeline.

Theoretical Basis

Scoring follows the evaluator pattern:

Input Contract: The scorer receives the model output (as output keyword argument) plus any dataset columns matching its parameter names.
Column Mapping: A column_map dict remaps dataset column names to scorer argument names when they differ.
Scoring Logic: User-defined computation returns a score (numeric, boolean, dict, or structured result).
Summarization: Scores across all examples are aggregated using auto_summarize which computes:

- Numerics: mean and standard error
- Booleans: true_count and true_fraction
- Nested dicts: recursive summarization

Related Pages

Implemented By

Implementation:Wandb_Weave_Scorer

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment