Implementation:Cleanlab Cleanlab Multilabel Find Label Issues

Knowledge Sources	Cleanlab
Domains	Machine Learning, Data Quality, Multi-Label Classification
Last Updated	2026-02-09 00:00 GMT

Overview

The find_label_issues and find_multilabel_issues_per_class functions identify potentially mislabeled examples in multi-label classification datasets by decomposing the problem into independent binary classification subproblems per class.

Description

This module in cleanlab.multilabel_classification.filter provides two public functions:

find_label_issues: Identifies examples where any class appears to be incorrectly annotated. It delegates to the internal _find_label_issues_multilabel function from cleanlab.filter. Returns either a boolean mask (where True indicates a label issue) or an array of indices sorted by likelihood of mislabeling. Supports a low_memory mode that uses batched label issue detection for large datasets.

find_multilabel_issues_per_class: Provides finer-grained analysis by determining which specific classes are incorrectly annotated for each example. For each of the K classes, it constructs a binary (one-vs-rest) label vector and a complementary predicted probability matrix, then runs standard binary cleanlab.filter.find_label_issues on each subproblem. Can return boolean masks per class or ranked index lists along with per-class labels and prediction probabilities.

Both functions accept a confident_joint parameter in the (K, 2, 2) one-vs-rest format, which captures the estimated joint distribution of noisy and true labels for each class independently.

Usage

Import find_label_issues when you need a single boolean mask or ranked list identifying which examples have any mislabeled class. Import find_multilabel_issues_per_class when you need to know exactly which classes are mislabeled for each example, which is useful for targeted re-annotation or for feeding into dataset summary functions.

Code Reference

Source Location

Repository: Cleanlab
File: cleanlab/multilabel_classification/filter.py
Lines: 1-303

Signature

def find_label_issues(
    labels: list,
    pred_probs: np.ndarray,
    return_indices_ranked_by: Optional[str] = None,
    rank_by_kwargs={},
    filter_by: str = "prune_by_noise_rate",
    frac_noise: float = 1.0,
    num_to_remove_per_class: Optional[List[int]] = None,
    min_examples_per_class=1,
    confident_joint: Optional[np.ndarray] = None,
    n_jobs: Optional[int] = None,
    verbose: bool = False,
    low_memory: bool = False,
) -> np.ndarray

def find_multilabel_issues_per_class(
    labels: list,
    pred_probs: np.ndarray,
    return_indices_ranked_by: Optional[str] = None,
    rank_by_kwargs={},
    filter_by: str = "prune_by_noise_rate",
    frac_noise: float = 1.0,
    num_to_remove_per_class: Optional[List[int]] = None,
    min_examples_per_class=1,
    confident_joint: Optional[np.ndarray] = None,
    n_jobs: Optional[int] = None,
    verbose: bool = False,
    low_memory: bool = False,
) -> Union[np.ndarray, Tuple[List[np.ndarray], List[Any], List[np.ndarray]]]

Import

from cleanlab.multilabel_classification.filter import find_label_issues
from cleanlab.multilabel_classification.filter import find_multilabel_issues_per_class

I/O Contract

Inputs (find_label_issues)

Name	Type	Required	Description
labels	List[List[int]]	Yes	List of noisy labels where each element is a list of class indices the example belongs to (e.g. `[[1,2],[1],[0],...]`).
pred_probs	np.ndarray (N, K)	Yes	Model-predicted class probabilities. Columns need not sum to 1. Should ideally be out-of-sample predictions from cross-validation.
return_indices_ranked_by	str or None	No	If None, returns a boolean mask. Otherwise one of: 'self_confidence', 'normalized_margin', 'confidence_weighted_entropy'. Returns sorted indices.
rank_by_kwargs	dict	No	Extra keyword arguments for the ranking scoring function.
filter_by	str	No	Confident learning method for filtering: 'prune_by_noise_rate' (default), 'prune_by_class', 'both', 'confident_learning', 'predicted_neq_given', 'low_normalized_margin', 'low_self_confidence'.
frac_noise	float	No	Fraction of estimated label errors to return (default 1.0 = all).
num_to_remove_per_class	List[int]	No	Number of mislabeled examples to return per class.
min_examples_per_class	int	No	Minimum examples per class below which no issues are flagged (default 1).
confident_joint	np.ndarray (K, 2, 2)	No	One-vs-rest confident joint. Auto-computed if not provided.
n_jobs	int	No	Number of parallel processing threads.
verbose	bool	No	If True, prints multiprocessing info.
low_memory	bool	No	If True, uses batched detection for large datasets.

Outputs (find_label_issues)

Name	Type	Description
label_issues	np.ndarray	If return_indices_ranked_by is None: boolean mask of shape (N,) where True indicates a label issue. Otherwise: array of indices sorted by likelihood of mislabeling.

Inputs (find_multilabel_issues_per_class)

Parameters are identical to find_label_issues above.

Outputs (find_multilabel_issues_per_class)

Name	Type	Description
per_class_label_issues	np.ndarray or Tuple	If return_indices_ranked_by is None: boolean array of shape (N, K) where True at position (i, k) means class k is mislabeled for example i. If not None: returns a tuple of (label_issues_list, labels_list, pred_probs_list), each a list of length K.

Usage Examples

Basic Usage: Get Boolean Mask

from cleanlab.multilabel_classification.filter import find_label_issues
import numpy as np

labels = [[0, 1], [1], [0, 2], [2], [0, 1, 2]]
pred_probs = np.array([
    [0.9, 0.8, 0.1],
    [0.2, 0.9, 0.1],
    [0.8, 0.1, 0.7],
    [0.1, 0.2, 0.9],
    [0.7, 0.8, 0.6],
])

# Returns boolean mask: True = label issue for any class
issue_mask = find_label_issues(labels=labels, pred_probs=pred_probs)
print("Examples with issues:", np.where(issue_mask)[0])

Per-Class Analysis

from cleanlab.multilabel_classification.filter import find_multilabel_issues_per_class

# Returns (N, K) boolean array showing which specific classes have issues
per_class_issues = find_multilabel_issues_per_class(labels=labels, pred_probs=pred_probs)
for k in range(pred_probs.shape[1]):
    print(f"Class {k} issues in examples:", np.where(per_class_issues[:, k])[0])

Ranked Indices

# Get indices ranked by self_confidence (most likely mislabeled first)
ranked_issues = find_label_issues(
    labels=labels,
    pred_probs=pred_probs,
    return_indices_ranked_by="self_confidence",
)
print("Issue indices (ranked):", ranked_issues)

Related Pages

Principle:Cleanlab_Cleanlab_Multilabel_Label_Issue_Filtering

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment