Principle:Cleanlab Cleanlab Label Issue Filtering
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Data_Quality |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Algorithmic approach to identifying mislabeled examples in a dataset using confident learning strategies that compare model predictions against given labels.
Description
Label issue filtering identifies individual examples whose given labels are likely incorrect. It provides multiple filtering strategies of varying sophistication:
- predicted_neq_given -- the simplest approach, flagging examples where the model's predicted class differs from the given label.
- prune_by_noise_rate -- uses the confident joint to estimate per-class noise rates and removes the estimated number of errors per class.
- prune_by_class -- similar to noise rate pruning but uses the raw confident joint counts per class.
- both -- the intersection of prune_by_noise_rate and prune_by_class, yielding the most conservative (highest precision) set of label issues.
- confident_learning -- uses the off-diagonal entries of the confident joint directly to identify mislabeled examples.
- low_normalized_margin -- flags examples with the lowest normalized margin scores.
- low_self_confidence -- flags examples with the lowest self-confidence scores.
The core prune_by_noise_rate method is the default and is most faithful to the Confident Learning paper. It uses the noise rate estimates from the confident joint to determine how many examples in each class have incorrect labels, then selects the lowest-quality examples up to that estimated count.
Usage
Use when you want to identify which specific examples in your dataset have incorrect labels, so you can review, correct, or remove them. This is the primary entry point for label issue detection in cleanlab and is suitable for any supervised classification task where you have out-of-sample predicted probabilities.
Theoretical Basis
Step 1: Estimate per-class error counts.
From the confident joint C, compute the estimated number of label errors for each given label class k:
n_errors[k] = sum(C[k, :]) - C[k, k]
This is the total count of examples labeled as class k whose confident true label is a different class.
Step 2: Compute label quality scores.
For each example i with given label k, compute a quality score (e.g., self_confidence):
score[i] = pred_probs[i, k]
Lower scores indicate the model is less confident that the given label is correct.
Step 3: Select label issues per class.
For each class k, sort examples labeled as k by their quality score in ascending order, and flag the bottom n_errors[k] examples as label issues:
For class k:
candidates = { i : labels[i] == k }
sorted_candidates = sort candidates by score[i] ascending
label_issues[k] = sorted_candidates[:n_errors[k]]
Step 4: Apply frac_noise scaling (optional).
The number of issues per class can be scaled by a factor frac_noise (default 1.0) to find more or fewer issues:
n_to_flag[k] = round(n_errors[k] * frac_noise)
Different strategies vary in how they count errors (Step 1) and rank examples (Step 2), but all follow this general pattern of estimating error counts and selecting the lowest-quality examples.