Principle:Snorkel team Snorkel Slice Aware Data Preparation
| Knowledge Sources | |
|---|---|
| Domains | Data_Slicing, Data_Preparation, Multi_Task_Learning |
| Last Updated | 2026-02-14 20:00 GMT |
Overview
A data preparation strategy that augments standard datasets with slice-specific indicator and prediction labels required for slice-aware multi-task training.
Description
Slice-Aware Data Preparation converts a standard dataset into one suitable for slice-aware training. For each slice, two additional label sets are created:
- Indicator labels: Binary labels (0/1) indicating slice membership (from the slice matrix S)
- Prediction labels: Classification labels masked to -1 for data points not in the slice
This labeling scheme allows the multi-task model to simultaneously learn slice membership detection and slice-specific classification, with the prediction labels only applying within each slice.
Usage
Use this principle after obtaining a slice matrix from SF application and before training a slice-aware model. The prepared dataloaders contain all necessary labels for the multi-task training loop.
Theoretical Basis
For a base task with labels and slice with indicator matrix :
Indicator labels:
Prediction labels:
The -1 entries are masked during loss computation so the predictor head only trains on in-slice examples.