Implementation:Cleanlab Cleanlab Multilabel Issue Manager

Knowledge Sources	Cleanlab
Domains	Data Quality, Multilabel Classification
Last Updated	2026-02-09 00:00 GMT

Overview

MultilabelIssueManager detects label issues in multilabel classification datasets where each example can have multiple simultaneous labels, flagging examples whose label sets are likely incorrect.

Description

The MultilabelIssueManager class extends IssueManager with issue_name = "label", the same issue name used by the single-label variant since it replaces it in multilabel contexts. It delegates the core detection logic to cleanlab's specialized multilabel classification utilities:

cleanlab.multilabel_classification.filter.find_label_issues() for boolean per-example issue detection.
cleanlab.multilabel_classification.rank.get_label_quality_scores() for per-example quality scoring.

Predicted labels are derived by thresholding pred_probs at 0.5 and converting to integer lists via onehot2int(). Keyword arguments for the underlying functions are filtered through dedicated static methods that whitelist only accepted parameters, ensuring safe delegation.

The summary score is the mean of all per-example label quality scores. The info dictionary stores the given labels and predicted labels as lists of integer lists.

Usage

Use MultilabelIssueManager when working with multilabel classification tasks such as image tagging, document categorization, or medical diagnosis where each example can belong to multiple classes simultaneously. It is automatically selected by the Datalab framework when the task type is detected as multilabel classification.

Code Reference

Source Location

Repository: Cleanlab
File: cleanlab/datalab/internal/issue_manager/multilabel/label.py
Lines: 1-135

Signature

class MultilabelIssueManager(IssueManager):
    description: ClassVar[str] = """Examples whose given label(s) are estimated to be potentially incorrect..."""
    issue_name: ClassVar[str] = "label"
    _PREDICTED_LABEL_THRESH = 0.5

    def __init__(self, datalab: Datalab, **_): ...

    @staticmethod
    def _process_find_label_issues_kwargs(**kwargs: Dict[str, Any]) -> Dict[str, Any]: ...

    @staticmethod
    def _process_get_label_quality_scores_kwargs(**kwargs: Dict[str, Any]) -> Dict[str, Any]: ...

    def find_issues(self, pred_probs: npt.NDArray, **kwargs) -> None: ...

    def collect_info(
        self, given_labels: List[List[int]], predicted_labels: List[List[int]]
    ) -> Dict[str, Any]: ...

Import

from cleanlab.datalab.internal.issue_manager.multilabel.label import MultilabelIssueManager

I/O Contract

Inputs

Name	Type	Required	Description
datalab	`Datalab`	Yes	A Datalab instance containing the dataset and its multilabel annotations.
pred_probs	`npt.NDArray`	Yes	Predicted probabilities for each example, shape `(n_samples, n_classes)`. Each entry is the model's estimated probability that the example belongs to that class.
filter_by	`str`	No	Method used to filter label issues (passed to `find_label_issues`).
frac_noise	`float`	No	Fraction of noise to assume in the dataset.
method	`str`	No	Method for computing label quality scores (e.g., `"self_confidence"`).
adjust_pred_probs	`bool`	No	Whether to adjust predicted probabilities before scoring.

Outputs

Name	Type	Description
self.issues	`pd.DataFrame`	DataFrame with `is_label_issue` (boolean) and `label_score` (float between 0 and 1) per example. Lower scores indicate higher likelihood of a label error.
self.summary	`pd.DataFrame`	Summary DataFrame with the mean label quality score across all examples.
self.info	`dict`	Dictionary containing `given_label` (original multilabel annotations) and `predicted_label` (model-predicted label sets).

Keyword Argument Filtering

The class provides two static methods for filtering keyword arguments:

_process_find_label_issues_kwargs

Whitelisted parameters: filter_by, frac_noise, num_to_remove_per_class, min_examples_per_class, confident_joint, n_jobs, verbose, low_memory.

_process_get_label_quality_scores_kwargs

Whitelisted parameters: method, adjust_pred_probs, aggregator_kwargs.

Usage Examples

Basic Usage

import numpy as np
from cleanlab import Datalab

# Multilabel dataset where each example can have multiple labels
labels = [[0, 1], [1], [0, 2], [2], [0, 1, 2]]
data = {"text": ["a", "b", "c", "d", "e"], "label": labels}

# pred_probs from a trained multilabel classifier, shape (n_samples, n_classes)
pred_probs = np.array([
    [0.9, 0.8, 0.1],
    [0.1, 0.9, 0.2],
    [0.8, 0.1, 0.9],
    [0.2, 0.1, 0.8],
    [0.7, 0.6, 0.7],
])

# Datalab automatically selects MultilabelIssueManager for multilabel tasks
lab = Datalab(data=data, label_name="label", task="multilabel")
lab.find_issues(pred_probs=pred_probs)
lab.report()

Related Pages

Principle:Cleanlab_Cleanlab_Datalab_Multilabel_Label_Issue_Detection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment