Principle:Snorkel team Snorkel Slice Performance Evaluation
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Data_Slicing, Robustness |
| Last Updated | 2026-02-14 20:00 GMT |
Overview
An evaluation methodology that measures model performance separately on each critical data slice to ensure robust behavior across all important subpopulations.
Description
Slice Performance Evaluation goes beyond aggregate metrics to provide per-slice performance breakdowns. This is critical because a model can have high overall accuracy while severely underperforming on important minority slices.
The evaluation uses the base tasks prediction head (not slice-specific heads) to evaluate on slice subsets, ensuring the final predictions reflect the models actual output behavior. Indicator task labels are excluded from evaluation since they are auxiliary training signals.
Usage
Use this principle after training a slice-aware model to verify performance on each defined slice. Compare with overall metrics to identify slices where the model may need improvement.
Theoretical Basis
For each slice , compute metrics only on data points in the slice:
where and is a metric function (accuracy, F1, etc.).
By comparing across slices and with the overall metric, practitioners can identify problematic subpopulations.