Principle:Snorkel team Snorkel Slicing Function Definition
| Knowledge Sources | |
|---|---|
| Domains | Data_Slicing, Robustness, Model_Evaluation |
| Last Updated | 2026-02-14 20:00 GMT |
Overview
A mechanism for programmatically defining critical data subsets (slices) that require special attention during model training and evaluation.
Description
Slicing Function Definition extends the labeling function paradigm to identify critical data subsets rather than assign labels. A slicing function (SF) takes a data point and returns a binary indicator: 1 if the data point belongs to the slice, 0 if not.
Slices represent subsets of data where model performance is particularly important, such as:
- Short text messages in a spam classifier
- Rare categories in a product classifier
- Edge cases identified by domain experts
The SlicingFunction class inherits directly from LabelingFunction, reusing its preprocessing and resource infrastructure but with binary output semantics.
Usage
Use this principle when you need to identify and monitor model performance on critical data subsets. Define slicing functions to capture subpopulations where failures are costly, where data is underrepresented, or where domain-specific patterns require special handling.
Theoretical Basis
A slicing function defines a binary partition:
where indicates membership in the slice. The collection of slicing functions creates a slice matrix where is the number of slices.
Unlike labeling functions, slicing functions:
- Do not abstain (always return 0 or 1)
- Define overlapping subsets (a data point can be in multiple slices)
- Are used for model conditioning, not label generation