Implementation:Cleanlab Cleanlab Order Label Issues

Knowledge Sources	Cleanlab Cleanlab Docs
Domains	Machine_Learning, Data_Quality
Last Updated	2026-02-09 19:00 GMT

Overview

Concrete tool for sorting detected label issues by severity to enable prioritized human review, provided by the Cleanlab library.

Description

This function takes a boolean mask of detected label issues along with the original labels and predicted probabilities, and returns the indices of flagged examples sorted by their label quality score in ascending order. The first index in the returned array corresponds to the example most likely to be mislabeled (lowest quality score), enabling reviewers to start with the most severe issues. The rank_by parameter controls which quality scoring method is used for ordering, and additional keyword arguments can be passed through to the scoring function via rank_by_kwargs.

Usage

Import and use this function after calling find_label_issues to obtain a boolean mask. This function takes that mask and produces a prioritized list of indices for human review. It is especially useful when the number of detected issues is large and you want to focus review effort on the most impactful corrections.

Code Reference

Source Location

Repository: cleanlab
File: cleanlab/rank.py
Lines: 398-461

Signature

def order_label_issues(
    label_issues_mask: np.ndarray,
    labels: np.ndarray,
    pred_probs: np.ndarray,
    *,
    rank_by: str = "self_confidence",
    rank_by_kwargs: dict = {},
) -> np.ndarray

Import

from cleanlab.rank import order_label_issues

I/O Contract

Inputs

Name	Type	Required	Description
label_issues_mask	np.ndarray	Yes	Boolean array of shape (N,) where True indicates a detected label issue, typically produced by find_label_issues.
labels	np.ndarray	Yes	Array of noisy class labels of shape (N,) with integer values 0..K-1.
pred_probs	np.ndarray	Yes	Out-of-sample predicted probability matrix of shape (N, K). Each row sums to 1.
rank_by	str	No	Quality scoring method used for ordering. One of "self_confidence" (default), "normalized_margin", or "confidence_weighted_entropy".
rank_by_kwargs	dict	No	Additional keyword arguments passed to the scoring function. Defaults to empty dict.

Outputs

Name	Type	Description
ordered_indices	np.ndarray	Array of integer indices corresponding to the flagged label issues, sorted by quality score in ascending order. The first index is the example most likely to be mislabeled.

Usage Examples

Basic Usage

import numpy as np
from cleanlab.filter import find_label_issues
from cleanlab.rank import order_label_issues

labels = np.array([0, 0, 1, 1, 2, 2, 0, 1, 2, 1])
pred_probs = np.array([
    [0.9, 0.05, 0.05],
    [0.2, 0.7, 0.1],   # likely mislabeled
    [0.1, 0.8, 0.1],
    [0.05, 0.1, 0.85],  # likely mislabeled
    [0.1, 0.1, 0.8],
    [0.05, 0.05, 0.9],
    [0.85, 0.1, 0.05],
    [0.1, 0.7, 0.2],
    [0.0, 0.2, 0.8],
    [0.15, 0.75, 0.1],
])

# Step 1: Find label issues
issue_mask = find_label_issues(labels, pred_probs)

# Step 2: Order by severity
ordered = order_label_issues(issue_mask, labels, pred_probs)
print("Issues ordered by severity:", ordered)
# First index is the most likely mislabeled example

Using Different Ranking Methods

from cleanlab.rank import order_label_issues

# Order by normalized margin instead of self-confidence
ordered_nm = order_label_issues(
    issue_mask, labels, pred_probs,
    rank_by="normalized_margin",
)

# Order by confidence-weighted entropy
ordered_cwe = order_label_issues(
    issue_mask, labels, pred_probs,
    rank_by="confidence_weighted_entropy",
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment