Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Wandb Weave Scorer

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-14 00:00 GMT

Overview

Concrete tool for defining evaluation scoring functions provided by the Wandb Weave library.

Description

The Scorer base class provides the standard interface for evaluation metrics. Subclasses implement a score() method (decorated with @weave.op) that receives model output and dataset fields. The optional column_map field remaps dataset column names to scorer parameters.

The default summarize() method delegates to auto_summarize(), which computes mean/stderr for numerics and true_count/true_fraction for booleans.

Usage

Subclass Scorer and implement score() to create custom evaluation metrics. Alternatively, decorate a plain function with @weave.op for simple scoring logic.

Code Reference

Source Location

  • Repository: wandb/weave
  • File: weave/flow/scorer.py
  • Lines: L30-186 (Scorer class + auto_summarize)

Signature

class Scorer(Object):
    column_map: dict[str, str] | None = Field(
        default=None,
        description="A mapping from dataset column names to scorer parameter names",
    )

    @op
    def score(self, *, output: Any, **kwargs: Any) -> Any:
        """Score model output. Must be overridden by subclasses."""
        raise NotImplementedError

    @op
    def summarize(self, score_rows: list) -> dict | None:
        """Summarize scores. Defaults to auto_summarize."""
        return auto_summarize(score_rows)

def auto_summarize(data: list) -> dict[str, Any] | None:
    """Automatically summarize a list of (potentially nested) dicts.
    Computes avg for numeric cols, count/fraction for boolean cols.
    """

Import

import weave
# or
from weave import Scorer

I/O Contract

Inputs (score)

Name Type Required Description
output Any Yes Model prediction output (keyword-only)
**kwargs Any Varies Dataset columns mapped via column_map or matched by name

Outputs (score)

Name Type Description
return Any Score result (dict, bool, float, or WeaveScorerResult)

Inputs (auto_summarize)

Name Type Required Description
data list Yes List of score dicts from all examples

Outputs (auto_summarize)

Name Type Description
return None mean/stderr for numerics, true_count/true_fraction for booleans

Usage Examples

Class-Based Scorer

import weave

class ExactMatchScorer(weave.Scorer):
    @weave.op
    def score(self, *, output: dict, expected: str) -> dict:
        return {"match": output.get("answer") == expected}

Function-Based Scorer

import weave

@weave.op
def match_score(output: dict, expected: str) -> dict:
    return {"match": output.get("answer") == expected}

With Column Mapping

import weave

class MyScorer(weave.Scorer):
    column_map = {"expected": "ground_truth"}

    @weave.op
    def score(self, *, output: dict, expected: str) -> dict:
        return {"match": output.get("answer") == expected}

# Dataset has "ground_truth" column, mapped to "expected" parameter

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment