Implementation:Cleanlab Cleanlab Multilabel Scorer

Knowledge Sources	Cleanlab
Domains	Data Quality, Machine Learning, Multi-label Classification
Last Updated	2026-02-09 00:00 GMT

Overview

Implements the scoring pipeline for detecting label issues in multi-label classification, where each example can belong to multiple classes simultaneously.

Description

The multilabel_scorer module provides a decompose-then-aggregate pattern for computing label quality scores in multi-label classification. The ClassLabelScorer enum wraps three binary scoring methods (self-confidence, normalized margin, confidence weighted entropy) from cleanlab.rank using a _Wrapper helper class. For each class, the multi-label problem is treated as an independent binary classification task, where predicted probabilities are converted to two-column format via stack_complement. The Aggregator class reduces per-class scores (shape N x K) to a single score per example (shape N) using configurable aggregation functions. Two built-in aggregators are provided: exponential_moving_average (sorts scores in descending order and applies EMA with a forgetting factor alpha) and softmin (weighted soft minimum via softmax on negated scores). The MultilabelScorer class orchestrates the full pipeline, while get_label_quality_scores provides a simple function interface. Additional utilities include multilabel_py for computing per-class label priors and get_cross_validated_multilabel_pred_probs for obtaining out-of-sample predictions.

Usage

Import this module when working with multi-label classification datasets where each example can have multiple labels, and you need to score label quality or detect label issues. It is used internally by cleanlab's higher-level APIs for multi-label data quality analysis.

Code Reference

Source Location

Repository: Cleanlab
File: cleanlab/internal/multilabel_scorer.py
Lines: 1-653

Signature

class MultilabelScorer:
    def __init__(
        self,
        base_scorer: ClassLabelScorer = ClassLabelScorer.SELF_CONFIDENCE,
        aggregator: Union[Aggregator, Callable] = Aggregator(
            exponential_moving_average, alpha=0.8
        ),
        *,
        strict: bool = True,
    )

def get_label_quality_scores(
    labels,
    pred_probs,
    *,
    method: MultilabelScorer = MultilabelScorer(),
    base_scorer_kwargs: Optional[dict] = None,
    **aggregator_kwargs,
) -> np.ndarray

class ClassLabelScorer(Enum):
    SELF_CONFIDENCE = _Wrapper(get_self_confidence_for_each_label)
    NORMALIZED_MARGIN = _Wrapper(get_normalized_margin_for_each_label)
    CONFIDENCE_WEIGHTED_ENTROPY = _Wrapper(get_confidence_weighted_entropy_for_each_label)

Import

from cleanlab.internal.multilabel_scorer import (
    MultilabelScorer,
    ClassLabelScorer,
    Aggregator,
    get_label_quality_scores,
    exponential_moving_average,
    softmin,
    multilabel_py,
    get_cross_validated_multilabel_pred_probs,
)

I/O Contract

Inputs (MultilabelScorer.call)

Name	Type	Required	Description
labels	np.ndarray	Yes	2D binary array of shape (N, K) where N is the number of samples and K is the number of classes.
pred_probs	np.ndarray	Yes	2D array of shape (N, K) with predicted probabilities for each class. Values do not need to sum to 1 across classes.
base_scorer_kwargs	dict	No	Keyword arguments passed to the base scoring function (e.g., adjust_pred_probs).
aggregator_kwargs	dict	No	Additional keyword arguments passed to the aggregation function (e.g., alpha for EMA).

Inputs (get_label_quality_scores)

Name	Type	Required	Description
labels	np.ndarray	Yes	2D binary array of shape (N, K).
pred_probs	np.ndarray	Yes	2D array of shape (N, K) with predicted probabilities.
method	MultilabelScorer	No	Scoring and aggregation method. Default uses SELF_CONFIDENCE with EMA(alpha=0.8).
base_scorer_kwargs	dict	No	Keyword arguments for the class-label scorer.
aggregator_kwargs	varies	No	Additional keyword arguments for the aggregator.

Outputs

Name	Type	Description
scores	np.ndarray	1D array of shape (N,) with overall quality scores for each example. Lower scores indicate more likely mislabeled examples.

Key Components

ClassLabelScorer Enum

Value	Scoring Method	Description
SELF_CONFIDENCE	get_self_confidence_for_each_label	Probability assigned to the given label by the model.
NORMALIZED_MARGIN	get_normalized_margin_for_each_label	Difference between the probability of the given label and the most likely alternative, normalized.
CONFIDENCE_WEIGHTED_ENTROPY	get_confidence_weighted_entropy_for_each_label	Entropy-based score weighted by model confidence.

Built-in Aggregators

Name	Description
exponential_moving_average	Sorts per-class scores in descending order and computes EMA. Default alpha=0.8 gives high weight to the worst-scoring class.
softmin	Computes a soft minimum of per-class scores using softmax on (1 - scores) with a temperature parameter.

Usage Examples

Basic Usage

import numpy as np
from cleanlab.internal.multilabel_scorer import MultilabelScorer, ClassLabelScorer

labels = np.array([[0, 1, 0], [1, 0, 1], [1, 1, 0]])
pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9], [0.8, 0.7, 0.2]])

scorer = MultilabelScorer()
scores = scorer(labels, pred_probs)
print(f"Label quality scores: {scores}")

Custom Scorer and Aggregator

import numpy as np
from cleanlab.internal.multilabel_scorer import (
    MultilabelScorer,
    ClassLabelScorer,
    Aggregator,
)

labels = np.array([[0, 1, 0], [1, 0, 1]])
pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])

# Use normalized margin scoring with np.min aggregation
scorer = MultilabelScorer(
    base_scorer=ClassLabelScorer.NORMALIZED_MARGIN,
    aggregator=np.min,
)
scores = scorer(labels, pred_probs)
print(f"Scores with min aggregation: {scores}")

Function Interface

import numpy as np
import cleanlab.internal.multilabel_scorer as ml_scorer

labels = np.array([[0, 1, 0], [1, 0, 1]])
pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])

scores = ml_scorer.get_label_quality_scores(labels, pred_probs)
print(f"Scores: {scores}")

Related Pages

Principle:Cleanlab_Cleanlab_Multilabel_Quality_Scoring

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment