Implementation:Cleanlab Cleanlab Find Label Issues

Knowledge Sources	Cleanlab Cleanlab Docs
Domains	Machine_Learning, Data_Quality
Last Updated	2026-02-09 19:00 GMT

Overview

Concrete tool for identifying mislabeled examples in a dataset using confident learning filtering strategies provided by the Cleanlab library.

Description

This function takes noisy labels and out-of-sample predicted probabilities and returns either a boolean mask or sorted indices indicating which examples are estimated to have label issues. It supports 7 different filtering strategies via the filter_by parameter. Internally, it computes the confident joint (if not provided), estimates per-class noise rates, computes label quality scores, and applies the selected filtering strategy to identify the estimated number of label errors. The function also supports multi-label classification, parallel execution via n_jobs, and fine-grained control over the number of issues to flag per class.

Usage

Import and use this function as the primary entry point for detecting label issues in your dataset. You need out-of-sample predicted probabilities (obtained via cross-validation or a held-out model) and the noisy labels. This is the most commonly used function in cleanlab for identifying specific mislabeled examples.

Code Reference

Source Location

Repository: cleanlab
File: cleanlab/filter.py
Lines: 57-451

Signature

def find_label_issues(
    labels,
    pred_probs,
    *,
    return_indices_ranked_by=None,
    rank_by_kwargs=None,
    filter_by="prune_by_noise_rate",
    frac_noise=1.0,
    num_to_remove_per_class=None,
    min_examples_per_class=1,
    confident_joint=None,
    n_jobs=None,
    verbose=False,
    multi_label=False,
) -> np.ndarray

Import

from cleanlab.filter import find_label_issues

I/O Contract

Inputs

Name	Type	Required	Description
labels	LabelLike	Yes	Array of noisy class labels of shape (N,) with integer values 0..K-1.
pred_probs	np.ndarray	Yes	Out-of-sample predicted probability matrix of shape (N, K).
return_indices_ranked_by	Optional[str]	No	If set (e.g., "self_confidence", "normalized_margin", "confidence_weighted_entropy"), returns sorted indices instead of a boolean mask. The indices are sorted by the chosen quality score in ascending order (most likely mislabeled first).
rank_by_kwargs	Optional[dict]	No	Additional keyword arguments passed to the ranking/scoring method.
filter_by	str	No	Filtering strategy. One of "prune_by_noise_rate" (default), "prune_by_class", "both", "confident_learning", "predicted_neq_given", "low_normalized_margin", "low_self_confidence".
frac_noise	float	No	Fraction of estimated noise to flag. 1.0 (default) flags the estimated number of issues; values above 1.0 flag more, below 1.0 flag fewer.
num_to_remove_per_class	Optional[list]	No	Explicit per-class counts of issues to flag, overriding the estimated counts.
min_examples_per_class	int	No	Minimum number of examples to retain per class after removing issues. Defaults to 1.
confident_joint	Optional[np.ndarray]	No	Pre-computed confident joint of shape (K, K). If None, computed internally.
n_jobs	Optional[int]	No	Number of parallel jobs for computation. Defaults to None (single-threaded).
verbose	bool	No	If True, print progress information. Defaults to False.
multi_label	bool	No	If True, handle multi-label classification. Defaults to False.

Outputs

Name	Type	Description
label_issues	np.ndarray	If return_indices_ranked_by is None: boolean mask of shape (N,) where True indicates a detected label issue. If return_indices_ranked_by is set: array of integer indices sorted by the chosen quality score ascending (most likely mislabeled first).

Usage Examples

Basic Usage (Boolean Mask)

import numpy as np
from cleanlab.filter import find_label_issues

labels = np.array([0, 0, 1, 1, 2, 2])
pred_probs = np.array([
    [0.9, 0.05, 0.05],
    [0.2, 0.7, 0.1],   # labeled 0 but model predicts 1
    [0.1, 0.8, 0.1],
    [0.05, 0.1, 0.85],  # labeled 1 but model predicts 2
    [0.1, 0.1, 0.8],
    [0.05, 0.05, 0.9],
])

issue_mask = find_label_issues(labels, pred_probs)
print("Label issues detected at indices:", np.where(issue_mask)[0])

Ranked Indices

from cleanlab.filter import find_label_issues

# Get indices sorted by self_confidence (most likely mislabeled first)
ranked_issues = find_label_issues(
    labels, pred_probs,
    return_indices_ranked_by="self_confidence",
)
print("Issues ranked by severity:", ranked_issues)

Using Different Filtering Strategies

from cleanlab.filter import find_label_issues

# Conservative approach: intersection of two methods
issue_mask = find_label_issues(
    labels, pred_probs,
    filter_by="both",
)

# Find more issues by scaling the noise estimate
issue_mask = find_label_issues(
    labels, pred_probs,
    filter_by="prune_by_noise_rate",
    frac_noise=1.5,
)

Related Pages

Implements Principle

Principle:Cleanlab_Cleanlab_Label_Issue_Filtering

Requires Environment

Environment:Cleanlab_Cleanlab_Python_Core_Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment