Principle:Cleanlab Cleanlab Label Issue Ordering
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Data_Quality |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Method for sorting detected label issues by severity to prioritize human review of the most egregious mislabeled examples first.
Description
Once label issues have been identified via filtering (producing a boolean mask of suspected mislabeled examples), ordering sorts them by label quality score so the most problematic examples appear first. This enables efficient human review workflows where annotators focus their limited time on the most impactful corrections first, maximizing the improvement in dataset quality per unit of review effort.
The ordering is determined by computing a label quality score for each flagged example and sorting by that score in ascending order. The most likely mislabeled examples (lowest quality scores) appear at the beginning of the returned list. Multiple scoring methods are supported, allowing the ordering to reflect different notions of label quality.
Usage
Use after find_label_issues to get a prioritized list of examples to review, starting with those most likely to be mislabeled. This is particularly valuable when the number of detected label issues is large and it is not feasible to review all of them. By reviewing examples in order of severity, annotators can correct the most impactful errors first.
Theoretical Basis
Given a boolean mask label_issues_mask of shape (N,) indicating which examples are suspected label issues, the ordering procedure works as follows:
Step 1: Compute quality scores for flagged examples.
For each example i where label_issues_mask[i] = True, compute a quality score using one of the available scoring methods (e.g., self_confidence):
score[i] = quality_score(labels[i], pred_probs[i])
Step 2: Sort flagged indices by score ascending.
flagged_indices = { i : label_issues_mask[i] == True }
ordered_indices = sort flagged_indices by score[i] ascending
The result is an array of indices where the first element is the example most likely to be mislabeled (lowest quality score) and the last element is the flagged example least likely to be mislabeled (highest quality score among flagged examples).
This ordering is stable: examples with identical quality scores maintain their original relative order. The choice of scoring method (self_confidence, normalized_margin, confidence_weighted_entropy) affects the ordering and can be selected based on the user's preference for how severity is measured.