Implementation:Snorkel team Snorkel Scorer Score
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-14 20:00 GMT |
Overview
Concrete tool for computing classification metrics (accuracy, F1, precision, recall) against gold labels, provided by the Snorkel library.
Description
The Scorer class provides a flexible multi-metric evaluation interface. It accepts standard metric names (mapped to sklearn implementations) and optional custom metric functions. The score() method computes all configured metrics, while score_slices() computes per-slice metrics.
Additionally, MajorityLabelVoter provides a simple majority-vote baseline, and probs_to_preds converts probability arrays to discrete predictions with configurable tie-breaking policies.
Usage
Import this class when evaluating label quality from a label model against a gold-labeled development set. Also use for evaluating downstream classifier performance.
Code Reference
Source Location
- Repository: snorkel
- File: snorkel/analysis/scorer.py (Scorer L10-164), snorkel/labeling/model/baselines.py (MajorityLabelVoter L97-134), snorkel/utils/core.py (probs_to_preds L13-72)
Signature
class Scorer:
def __init__(
self,
metrics: Optional[List[str]] = None,
custom_metric_funcs: Optional[Mapping[str, Callable[..., float]]] = None,
abstain_label: Optional[int] = -1,
) -> None:
"""
Args:
metrics: List of metric names ("accuracy", "f1", "precision",
"recall", "f1_micro", "f1_macro", "fbeta",
"matthews_corrcoef", "roc_auc", "coverage").
custom_metric_funcs: Dict mapping metric names to functions.
abstain_label: Label value for abstentions (default -1).
"""
def score(
self,
golds: np.ndarray,
preds: Optional[np.ndarray] = None,
probs: Optional[np.ndarray] = None,
) -> Dict[str, float]:
"""
Calculate scores for configured metrics.
Args:
golds: [n] gold labels.
preds: [n] predicted labels (optional).
probs: [n, k] probability predictions (optional).
Returns:
Dict mapping metric names to float scores.
"""
class MajorityLabelVoter(BaseLabeler):
def predict_proba(self, L: np.ndarray) -> np.ndarray:
"""Majority vote across LFs. Returns [n, k] probabilities."""
def probs_to_preds(
probs: np.ndarray,
tie_break_policy: str = "random",
tol: float = 1e-5,
) -> np.ndarray:
"""Convert [n, k] probabilities to [n] hard predictions."""
Import
from snorkel.analysis import Scorer
from snorkel.labeling.model.baselines import MajorityLabelVoter
from snorkel.utils import probs_to_preds
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| metrics | List[str] | No | Metric names to compute |
| golds | np.ndarray | Yes | [n] gold label array |
| preds | Optional[np.ndarray] | No | [n] predicted labels |
| probs | Optional[np.ndarray] | No | [n, k] probability matrix |
Outputs
| Name | Type | Description |
|---|---|---|
| score() result | Dict[str, float] | Mapping of metric names to scores |
| MajorityLabelVoter.predict_proba() | np.ndarray | [n, k] majority vote probabilities |
| probs_to_preds() | np.ndarray | [n] hard integer predictions |
Usage Examples
Evaluate Label Model
import numpy as np
from snorkel.analysis import Scorer
from snorkel.labeling.model import LabelModel
from snorkel.labeling.model.baselines import MajorityLabelVoter
from snorkel.utils import probs_to_preds
# Assume label_model is trained, L_dev and Y_dev are available
L_dev = np.array([[0, 0, -1], [-1, 0, 1], [1, -1, 0]])
Y_dev = np.array([0, 1, 0])
# Label model predictions
probs = label_model.predict_proba(L_dev)
preds = probs_to_preds(probs, tie_break_policy="abstain")
# Score
scorer = Scorer(metrics=["accuracy", "f1", "precision", "recall"])
results = scorer.score(golds=Y_dev, preds=preds, probs=probs)
print(results)
# {'accuracy': 0.85, 'f1': 0.82, 'precision': 0.88, 'recall': 0.80}
Compare with Baseline
# Majority vote baseline
majority_voter = MajorityLabelVoter(cardinality=2)
majority_probs = majority_voter.predict_proba(L_dev)
majority_preds = probs_to_preds(majority_probs)
baseline_results = scorer.score(golds=Y_dev, preds=majority_preds)
print(f"Label Model accuracy: {results['accuracy']:.2f}")
print(f"Majority Vote accuracy: {baseline_results['accuracy']:.2f}")