Principle:Cleanlab Cleanlab Segmentation Label Issue Filtering
| Knowledge Sources | |
|---|---|
| Domains | Data Quality, Machine Learning, Computer Vision, Semantic Segmentation |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Segmentation label issue filtering applies Confident Learning at the pixel level to identify mislabeled pixels in semantic segmentation datasets, using batched processing and optional spatial downsampling to handle the scale of dense prediction tasks.
Description
Semantic segmentation datasets present a unique challenge for label quality assessment: each image contains thousands to millions of individually labeled pixels, making the total number of "examples" orders of magnitude larger than in standard classification. Segmentation label issue filtering addresses this by treating every pixel as an independent classification example and applying Confident Learning techniques to identify which pixels are likely mislabeled.
The core insight is that a pixel's label can be evaluated by comparing its assigned class against the model's predicted probability distribution for that pixel. However, naive per-pixel thresholding would be too crude. Instead, the approach first estimates confident thresholds for each class across the entire dataset (accounting for class imbalance and varying prediction confidence), then uses these thresholds to identify pixels whose labels conflict with the model's confident predictions.
Two additional techniques make this scalable:
Batched processing divides images into mini-batches, streaming pixel data through the threshold estimation and scoring stages without requiring the entire flattened dataset to fit in memory at once.
Spatial downsampling optionally reduces the spatial resolution of both labels and predictions before processing, then upsamples the results back. This dramatically reduces computation while preserving the ability to detect large mislabeled regions. After upsampling, a cross-check against the original resolution data ensures accuracy: a pixel is only marked as an issue if the model's top prediction disagrees with the given label at that exact pixel location.
Usage
Segmentation label issue filtering is the right approach when:
- You need a binary (yes/no) determination of which pixels are mislabeled.
- You are working with dense semantic segmentation annotations (not instance segmentation or bounding boxes).
- You want the number of detected issues to be automatically determined via Confident Learning, rather than manually setting a threshold.
- Your dataset is too large for all-at-once processing and requires batched computation.
For continuous quality scores (rather than binary issue detection), use segmentation label quality scoring instead.
Theoretical Basis
Confident Learning at Pixel Level
Confident Learning identifies label errors by estimating the joint distribution of noisy (given) labels and true (latent) labels. For segmentation, this is applied by treating each pixel as an independent example:
- Threshold estimation: For each class k, a confident threshold t_k is estimated such that pixels with predicted probability P(class=k|x) > t_k are "confidently" predicted as class k. The threshold is derived from the average predicted probability for pixels actually labeled as class k.
- Issue identification: A pixel is flagged as a potential issue if it is "confidently" predicted as a different class than its given label. Specifically, if a pixel is labeled as class j but is confidently predicted as class k (where k != j), it is considered a label error.
Downsampling and Upsampling Strategy
For large images, spatial downsampling reduces the H x W dimensions by a factor d:
- Downsampling: Labels are averaged and rounded over d x d blocks. Predicted probabilities are averaged over d x d blocks and renormalized to sum to 1 across classes.
- Processing: Confident Learning is applied at the reduced resolution.
- Upsampling: The issue mask is expanded by repeating each pixel d times in each spatial dimension.
- Cross-check: At the original resolution, each flagged pixel is verified by checking whether argmax(pred_probs) disagrees with the given label. Flagged pixels where the model agrees with the label are unflagged.
This cross-check step is critical because downsampling can introduce artifacts, and the final decision should be grounded in the original-resolution data.
Batch Processing
The pipeline operates in two sequential passes:
- Pass 1 (threshold estimation): All images are streamed through in batches, with each batch's pixels flattened and used to update running estimates of per-class confident thresholds.
- Pass 2 (scoring): All images are streamed through again, with each batch's pixels scored against the finalized thresholds to identify label issues.
This two-pass design ensures that thresholds are estimated from the entire dataset before any scoring occurs, which is important for consistent results across images.