Implementation:Cleanlab Cleanlab Order Label Issues
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Data_Quality |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Concrete tool for sorting detected label issues by severity to enable prioritized human review, provided by the Cleanlab library.
Description
This function takes a boolean mask of detected label issues along with the original labels and predicted probabilities, and returns the indices of flagged examples sorted by their label quality score in ascending order. The first index in the returned array corresponds to the example most likely to be mislabeled (lowest quality score), enabling reviewers to start with the most severe issues. The rank_by parameter controls which quality scoring method is used for ordering, and additional keyword arguments can be passed through to the scoring function via rank_by_kwargs.
Usage
Import and use this function after calling find_label_issues to obtain a boolean mask. This function takes that mask and produces a prioritized list of indices for human review. It is especially useful when the number of detected issues is large and you want to focus review effort on the most impactful corrections.
Code Reference
Source Location
- Repository: cleanlab
- File: cleanlab/rank.py
- Lines: 398-461
Signature
def order_label_issues(
label_issues_mask: np.ndarray,
labels: np.ndarray,
pred_probs: np.ndarray,
*,
rank_by: str = "self_confidence",
rank_by_kwargs: dict = {},
) -> np.ndarray
Import
from cleanlab.rank import order_label_issues
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| label_issues_mask | np.ndarray | Yes | Boolean array of shape (N,) where True indicates a detected label issue, typically produced by find_label_issues. |
| labels | np.ndarray | Yes | Array of noisy class labels of shape (N,) with integer values 0..K-1. |
| pred_probs | np.ndarray | Yes | Out-of-sample predicted probability matrix of shape (N, K). Each row sums to 1. |
| rank_by | str | No | Quality scoring method used for ordering. One of "self_confidence" (default), "normalized_margin", or "confidence_weighted_entropy". |
| rank_by_kwargs | dict | No | Additional keyword arguments passed to the scoring function. Defaults to empty dict. |
Outputs
| Name | Type | Description |
|---|---|---|
| ordered_indices | np.ndarray | Array of integer indices corresponding to the flagged label issues, sorted by quality score in ascending order. The first index is the example most likely to be mislabeled. |
Usage Examples
Basic Usage
import numpy as np
from cleanlab.filter import find_label_issues
from cleanlab.rank import order_label_issues
labels = np.array([0, 0, 1, 1, 2, 2, 0, 1, 2, 1])
pred_probs = np.array([
[0.9, 0.05, 0.05],
[0.2, 0.7, 0.1], # likely mislabeled
[0.1, 0.8, 0.1],
[0.05, 0.1, 0.85], # likely mislabeled
[0.1, 0.1, 0.8],
[0.05, 0.05, 0.9],
[0.85, 0.1, 0.05],
[0.1, 0.7, 0.2],
[0.0, 0.2, 0.8],
[0.15, 0.75, 0.1],
])
# Step 1: Find label issues
issue_mask = find_label_issues(labels, pred_probs)
# Step 2: Order by severity
ordered = order_label_issues(issue_mask, labels, pred_probs)
print("Issues ordered by severity:", ordered)
# First index is the most likely mislabeled example
Using Different Ranking Methods
from cleanlab.rank import order_label_issues
# Order by normalized margin instead of self-confidence
ordered_nm = order_label_issues(
issue_mask, labels, pred_probs,
rank_by="normalized_margin",
)
# Order by confidence-weighted entropy
ordered_cwe = order_label_issues(
issue_mask, labels, pred_probs,
rank_by="confidence_weighted_entropy",
)