Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Cleanlab Cleanlab Multilabel Find Label Issues

From Leeroopedia
Revision as of 14:36, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Cleanlab_Cleanlab_Multilabel_Find_Label_Issues.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Machine Learning, Data Quality, Multi-Label Classification
Last Updated 2026-02-09 00:00 GMT

Overview

The find_label_issues and find_multilabel_issues_per_class functions identify potentially mislabeled examples in multi-label classification datasets by decomposing the problem into independent binary classification subproblems per class.

Description

This module in cleanlab.multilabel_classification.filter provides two public functions:

  • find_label_issues: Identifies examples where any class appears to be incorrectly annotated. It delegates to the internal _find_label_issues_multilabel function from cleanlab.filter. Returns either a boolean mask (where True indicates a label issue) or an array of indices sorted by likelihood of mislabeling. Supports a low_memory mode that uses batched label issue detection for large datasets.
  • find_multilabel_issues_per_class: Provides finer-grained analysis by determining which specific classes are incorrectly annotated for each example. For each of the K classes, it constructs a binary (one-vs-rest) label vector and a complementary predicted probability matrix, then runs standard binary cleanlab.filter.find_label_issues on each subproblem. Can return boolean masks per class or ranked index lists along with per-class labels and prediction probabilities.

Both functions accept a confident_joint parameter in the (K, 2, 2) one-vs-rest format, which captures the estimated joint distribution of noisy and true labels for each class independently.

Usage

Import find_label_issues when you need a single boolean mask or ranked list identifying which examples have any mislabeled class. Import find_multilabel_issues_per_class when you need to know exactly which classes are mislabeled for each example, which is useful for targeted re-annotation or for feeding into dataset summary functions.

Code Reference

Source Location

  • Repository: Cleanlab
  • File: cleanlab/multilabel_classification/filter.py
  • Lines: 1-303

Signature

def find_label_issues(
    labels: list,
    pred_probs: np.ndarray,
    return_indices_ranked_by: Optional[str] = None,
    rank_by_kwargs={},
    filter_by: str = "prune_by_noise_rate",
    frac_noise: float = 1.0,
    num_to_remove_per_class: Optional[List[int]] = None,
    min_examples_per_class=1,
    confident_joint: Optional[np.ndarray] = None,
    n_jobs: Optional[int] = None,
    verbose: bool = False,
    low_memory: bool = False,
) -> np.ndarray
def find_multilabel_issues_per_class(
    labels: list,
    pred_probs: np.ndarray,
    return_indices_ranked_by: Optional[str] = None,
    rank_by_kwargs={},
    filter_by: str = "prune_by_noise_rate",
    frac_noise: float = 1.0,
    num_to_remove_per_class: Optional[List[int]] = None,
    min_examples_per_class=1,
    confident_joint: Optional[np.ndarray] = None,
    n_jobs: Optional[int] = None,
    verbose: bool = False,
    low_memory: bool = False,
) -> Union[np.ndarray, Tuple[List[np.ndarray], List[Any], List[np.ndarray]]]

Import

from cleanlab.multilabel_classification.filter import find_label_issues
from cleanlab.multilabel_classification.filter import find_multilabel_issues_per_class

I/O Contract

Inputs (find_label_issues)

Name Type Required Description
labels List[List[int]] Yes List of noisy labels where each element is a list of class indices the example belongs to (e.g. [[1,2],[1],[0],...]).
pred_probs np.ndarray (N, K) Yes Model-predicted class probabilities. Columns need not sum to 1. Should ideally be out-of-sample predictions from cross-validation.
return_indices_ranked_by str or None No If None, returns a boolean mask. Otherwise one of: 'self_confidence', 'normalized_margin', 'confidence_weighted_entropy'. Returns sorted indices.
rank_by_kwargs dict No Extra keyword arguments for the ranking scoring function.
filter_by str No Confident learning method for filtering: 'prune_by_noise_rate' (default), 'prune_by_class', 'both', 'confident_learning', 'predicted_neq_given', 'low_normalized_margin', 'low_self_confidence'.
frac_noise float No Fraction of estimated label errors to return (default 1.0 = all).
num_to_remove_per_class List[int] No Number of mislabeled examples to return per class.
min_examples_per_class int No Minimum examples per class below which no issues are flagged (default 1).
confident_joint np.ndarray (K, 2, 2) No One-vs-rest confident joint. Auto-computed if not provided.
n_jobs int No Number of parallel processing threads.
verbose bool No If True, prints multiprocessing info.
low_memory bool No If True, uses batched detection for large datasets.

Outputs (find_label_issues)

Name Type Description
label_issues np.ndarray If return_indices_ranked_by is None: boolean mask of shape (N,) where True indicates a label issue. Otherwise: array of indices sorted by likelihood of mislabeling.

Inputs (find_multilabel_issues_per_class)

Parameters are identical to find_label_issues above.

Outputs (find_multilabel_issues_per_class)

Name Type Description
per_class_label_issues np.ndarray or Tuple If return_indices_ranked_by is None: boolean array of shape (N, K) where True at position (i, k) means class k is mislabeled for example i. If not None: returns a tuple of (label_issues_list, labels_list, pred_probs_list), each a list of length K.

Usage Examples

Basic Usage: Get Boolean Mask

from cleanlab.multilabel_classification.filter import find_label_issues
import numpy as np

labels = [[0, 1], [1], [0, 2], [2], [0, 1, 2]]
pred_probs = np.array([
    [0.9, 0.8, 0.1],
    [0.2, 0.9, 0.1],
    [0.8, 0.1, 0.7],
    [0.1, 0.2, 0.9],
    [0.7, 0.8, 0.6],
])

# Returns boolean mask: True = label issue for any class
issue_mask = find_label_issues(labels=labels, pred_probs=pred_probs)
print("Examples with issues:", np.where(issue_mask)[0])

Per-Class Analysis

from cleanlab.multilabel_classification.filter import find_multilabel_issues_per_class

# Returns (N, K) boolean array showing which specific classes have issues
per_class_issues = find_multilabel_issues_per_class(labels=labels, pred_probs=pred_probs)
for k in range(pred_probs.shape[1]):
    print(f"Class {k} issues in examples:", np.where(per_class_issues[:, k])[0])

Ranked Indices

# Get indices ranked by self_confidence (most likely mislabeled first)
ranked_issues = find_label_issues(
    labels=labels,
    pred_probs=pred_probs,
    return_indices_ranked_by="self_confidence",
)
print("Issue indices (ranked):", ranked_issues)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment