Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Wandb Weave Scorer Definition

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-14 00:00 GMT

Overview

A scoring abstraction that defines how model outputs are evaluated against ground truth or quality criteria.

Description

Scorer Definition provides a standard interface for writing evaluation metrics. A scorer receives model output and optionally ground truth data from the dataset, computes a score, and returns it. Scorers can be class-based (inheriting from Scorer) or function-based (decorated with @weave.op).

The scoring framework supports column mapping to bridge differences between dataset column names and scorer parameter names, and automatic summarization that computes aggregate statistics (mean, stderr for numerics; count, fraction for booleans).

Usage

Use this principle when defining how model predictions should be graded. Scorers are composed with datasets and models in the Evaluation pipeline.

Theoretical Basis

Scoring follows the evaluator pattern:

  1. Input Contract: The scorer receives the model output (as output keyword argument) plus any dataset columns matching its parameter names.
  2. Column Mapping: A column_map dict remaps dataset column names to scorer argument names when they differ.
  3. Scoring Logic: User-defined computation returns a score (numeric, boolean, dict, or structured result).
  4. Summarization: Scores across all examples are aggregated using auto_summarize which computes:
    • Numerics: mean and standard error
    • Booleans: true_count and true_fraction
    • Nested dicts: recursive summarization

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment