Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Snorkel team Snorkel Slice Performance Evaluation

From Leeroopedia
Knowledge Sources
Domains Evaluation, Data_Slicing, Robustness
Last Updated 2026-02-14 20:00 GMT

Overview

An evaluation methodology that measures model performance separately on each critical data slice to ensure robust behavior across all important subpopulations.

Description

Slice Performance Evaluation goes beyond aggregate metrics to provide per-slice performance breakdowns. This is critical because a model can have high overall accuracy while severely underperforming on important minority slices.

The evaluation uses the base tasks prediction head (not slice-specific heads) to evaluate on slice subsets, ensuring the final predictions reflect the models actual output behavior. Indicator task labels are excluded from evaluation since they are auxiliary training signals.

Usage

Use this principle after training a slice-aware model to verify performance on each defined slice. Compare with overall metrics to identify slices where the model may need improvement.

Theoretical Basis

For each slice sj, compute metrics only on data points in the slice:

metricj=f(Y^Sj,YSj)

where Sj={i:sj(xi)=1} and f is a metric function (accuracy, F1, etc.).

By comparing metricj across slices and with the overall metric, practitioners can identify problematic subpopulations.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment