Principle:Wandb Weave Scorer Definition
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
A scoring abstraction that defines how model outputs are evaluated against ground truth or quality criteria.
Description
Scorer Definition provides a standard interface for writing evaluation metrics. A scorer receives model output and optionally ground truth data from the dataset, computes a score, and returns it. Scorers can be class-based (inheriting from Scorer) or function-based (decorated with @weave.op).
The scoring framework supports column mapping to bridge differences between dataset column names and scorer parameter names, and automatic summarization that computes aggregate statistics (mean, stderr for numerics; count, fraction for booleans).
Usage
Use this principle when defining how model predictions should be graded. Scorers are composed with datasets and models in the Evaluation pipeline.
Theoretical Basis
Scoring follows the evaluator pattern:
- Input Contract: The scorer receives the model output (as output keyword argument) plus any dataset columns matching its parameter names.
- Column Mapping: A column_map dict remaps dataset column names to scorer argument names when they differ.
- Scoring Logic: User-defined computation returns a score (numeric, boolean, dict, or structured result).
- Summarization: Scores across all examples are aggregated using auto_summarize which computes:
- Numerics: mean and standard error
- Booleans: true_count and true_fraction
- Nested dicts: recursive summarization