Principle:Snorkel team Snorkel Slice Aware Data Preparation

Knowledge Sources	Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices
Domains	Data_Slicing, Data_Preparation, Multi_Task_Learning
Last Updated	2026-02-14 20:00 GMT

Overview

A data preparation strategy that augments standard datasets with slice-specific indicator and prediction labels required for slice-aware multi-task training.

Description

Slice-Aware Data Preparation converts a standard dataset into one suitable for slice-aware training. For each slice, two additional label sets are created:

Indicator labels: Binary labels (0/1) indicating slice membership (from the slice matrix S)
Prediction labels: Classification labels masked to -1 for data points not in the slice

This labeling scheme allows the multi-task model to simultaneously learn slice membership detection and slice-specific classification, with the prediction labels only applying within each slice.

Usage

Use this principle after obtaining a slice matrix from SF application and before training a slice-aware model. The prepared dataloaders contain all necessary labels for the multi-task training loop.

Theoretical Basis

For a base task with labels $Y$ and slice $s_{j}$ with indicator matrix $S$ :

Indicator labels: $Y_{j}^{ind} = S_{:, j} \in {0, 1}^{n}$

Prediction labels: $Y_{j, i}^{pred} = {\begin{cases} Y_{i} & if S_{i, j} = 1 \\ - 1 & if S_{i, j} = 0 \end{cases}$

The -1 entries are masked during loss computation so the predictor head only trains on in-slice examples.

Related Pages

Implemented By

Implementation:Snorkel_team_Snorkel_SliceAwareClassifier_Make_Slice_Dataloader

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment