Principle:Snorkel team Snorkel Slice Performance Evaluation

Knowledge Sources	Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices
Domains	Evaluation, Data_Slicing, Robustness
Last Updated	2026-02-14 20:00 GMT

Overview

An evaluation methodology that measures model performance separately on each critical data slice to ensure robust behavior across all important subpopulations.

Description

Slice Performance Evaluation goes beyond aggregate metrics to provide per-slice performance breakdowns. This is critical because a model can have high overall accuracy while severely underperforming on important minority slices.

The evaluation uses the base tasks prediction head (not slice-specific heads) to evaluate on slice subsets, ensuring the final predictions reflect the models actual output behavior. Indicator task labels are excluded from evaluation since they are auxiliary training signals.

Usage

Use this principle after training a slice-aware model to verify performance on each defined slice. Compare with overall metrics to identify slices where the model may need improvement.

Theoretical Basis

For each slice $s_{j}$ , compute metrics only on data points in the slice:

${metric}_{j} = f ({\hat{Y}}_{S_{j}}, Y_{S_{j}})$

where $S_{j} = {i : s_{j} (x_{i}) = 1}$ and $f$ is a metric function (accuracy, F1, etc.).

By comparing ${metric}_{j}$ across slices and with the overall metric, practitioners can identify problematic subpopulations.

Related Pages

Implemented By

Implementation:Snorkel_team_Snorkel_SliceAwareClassifier_Score_Slices

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment