Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Cleanlab Cleanlab Order Label Issues

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Data_Quality
Last Updated 2026-02-09 19:00 GMT

Overview

Concrete tool for sorting detected label issues by severity to enable prioritized human review, provided by the Cleanlab library.

Description

This function takes a boolean mask of detected label issues along with the original labels and predicted probabilities, and returns the indices of flagged examples sorted by their label quality score in ascending order. The first index in the returned array corresponds to the example most likely to be mislabeled (lowest quality score), enabling reviewers to start with the most severe issues. The rank_by parameter controls which quality scoring method is used for ordering, and additional keyword arguments can be passed through to the scoring function via rank_by_kwargs.

Usage

Import and use this function after calling find_label_issues to obtain a boolean mask. This function takes that mask and produces a prioritized list of indices for human review. It is especially useful when the number of detected issues is large and you want to focus review effort on the most impactful corrections.

Code Reference

Source Location

  • Repository: cleanlab
  • File: cleanlab/rank.py
  • Lines: 398-461

Signature

def order_label_issues(
    label_issues_mask: np.ndarray,
    labels: np.ndarray,
    pred_probs: np.ndarray,
    *,
    rank_by: str = "self_confidence",
    rank_by_kwargs: dict = {},
) -> np.ndarray

Import

from cleanlab.rank import order_label_issues

I/O Contract

Inputs

Name Type Required Description
label_issues_mask np.ndarray Yes Boolean array of shape (N,) where True indicates a detected label issue, typically produced by find_label_issues.
labels np.ndarray Yes Array of noisy class labels of shape (N,) with integer values 0..K-1.
pred_probs np.ndarray Yes Out-of-sample predicted probability matrix of shape (N, K). Each row sums to 1.
rank_by str No Quality scoring method used for ordering. One of "self_confidence" (default), "normalized_margin", or "confidence_weighted_entropy".
rank_by_kwargs dict No Additional keyword arguments passed to the scoring function. Defaults to empty dict.

Outputs

Name Type Description
ordered_indices np.ndarray Array of integer indices corresponding to the flagged label issues, sorted by quality score in ascending order. The first index is the example most likely to be mislabeled.

Usage Examples

Basic Usage

import numpy as np
from cleanlab.filter import find_label_issues
from cleanlab.rank import order_label_issues

labels = np.array([0, 0, 1, 1, 2, 2, 0, 1, 2, 1])
pred_probs = np.array([
    [0.9, 0.05, 0.05],
    [0.2, 0.7, 0.1],   # likely mislabeled
    [0.1, 0.8, 0.1],
    [0.05, 0.1, 0.85],  # likely mislabeled
    [0.1, 0.1, 0.8],
    [0.05, 0.05, 0.9],
    [0.85, 0.1, 0.05],
    [0.1, 0.7, 0.2],
    [0.0, 0.2, 0.8],
    [0.15, 0.75, 0.1],
])

# Step 1: Find label issues
issue_mask = find_label_issues(labels, pred_probs)

# Step 2: Order by severity
ordered = order_label_issues(issue_mask, labels, pred_probs)
print("Issues ordered by severity:", ordered)
# First index is the most likely mislabeled example

Using Different Ranking Methods

from cleanlab.rank import order_label_issues

# Order by normalized margin instead of self-confidence
ordered_nm = order_label_issues(
    issue_mask, labels, pred_probs,
    rank_by="normalized_margin",
)

# Order by confidence-weighted entropy
ordered_cwe = order_label_issues(
    issue_mask, labels, pred_probs,
    rank_by="confidence_weighted_entropy",
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment