Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Cleanlab Cleanlab Multilabel Scorer

From Leeroopedia


Knowledge Sources
Domains Data Quality, Machine Learning, Multi-label Classification
Last Updated 2026-02-09 00:00 GMT

Overview

Implements the scoring pipeline for detecting label issues in multi-label classification, where each example can belong to multiple classes simultaneously.

Description

The multilabel_scorer module provides a decompose-then-aggregate pattern for computing label quality scores in multi-label classification. The ClassLabelScorer enum wraps three binary scoring methods (self-confidence, normalized margin, confidence weighted entropy) from cleanlab.rank using a _Wrapper helper class. For each class, the multi-label problem is treated as an independent binary classification task, where predicted probabilities are converted to two-column format via stack_complement. The Aggregator class reduces per-class scores (shape N x K) to a single score per example (shape N) using configurable aggregation functions. Two built-in aggregators are provided: exponential_moving_average (sorts scores in descending order and applies EMA with a forgetting factor alpha) and softmin (weighted soft minimum via softmax on negated scores). The MultilabelScorer class orchestrates the full pipeline, while get_label_quality_scores provides a simple function interface. Additional utilities include multilabel_py for computing per-class label priors and get_cross_validated_multilabel_pred_probs for obtaining out-of-sample predictions.

Usage

Import this module when working with multi-label classification datasets where each example can have multiple labels, and you need to score label quality or detect label issues. It is used internally by cleanlab's higher-level APIs for multi-label data quality analysis.

Code Reference

Source Location

  • Repository: Cleanlab
  • File: cleanlab/internal/multilabel_scorer.py
  • Lines: 1-653

Signature

class MultilabelScorer:
    def __init__(
        self,
        base_scorer: ClassLabelScorer = ClassLabelScorer.SELF_CONFIDENCE,
        aggregator: Union[Aggregator, Callable] = Aggregator(
            exponential_moving_average, alpha=0.8
        ),
        *,
        strict: bool = True,
    )
def get_label_quality_scores(
    labels,
    pred_probs,
    *,
    method: MultilabelScorer = MultilabelScorer(),
    base_scorer_kwargs: Optional[dict] = None,
    **aggregator_kwargs,
) -> np.ndarray
class ClassLabelScorer(Enum):
    SELF_CONFIDENCE = _Wrapper(get_self_confidence_for_each_label)
    NORMALIZED_MARGIN = _Wrapper(get_normalized_margin_for_each_label)
    CONFIDENCE_WEIGHTED_ENTROPY = _Wrapper(get_confidence_weighted_entropy_for_each_label)

Import

from cleanlab.internal.multilabel_scorer import (
    MultilabelScorer,
    ClassLabelScorer,
    Aggregator,
    get_label_quality_scores,
    exponential_moving_average,
    softmin,
    multilabel_py,
    get_cross_validated_multilabel_pred_probs,
)

I/O Contract

Inputs (MultilabelScorer.__call__)

Name Type Required Description
labels np.ndarray Yes 2D binary array of shape (N, K) where N is the number of samples and K is the number of classes.
pred_probs np.ndarray Yes 2D array of shape (N, K) with predicted probabilities for each class. Values do not need to sum to 1 across classes.
base_scorer_kwargs dict No Keyword arguments passed to the base scoring function (e.g., adjust_pred_probs).
aggregator_kwargs dict No Additional keyword arguments passed to the aggregation function (e.g., alpha for EMA).

Inputs (get_label_quality_scores)

Name Type Required Description
labels np.ndarray Yes 2D binary array of shape (N, K).
pred_probs np.ndarray Yes 2D array of shape (N, K) with predicted probabilities.
method MultilabelScorer No Scoring and aggregation method. Default uses SELF_CONFIDENCE with EMA(alpha=0.8).
base_scorer_kwargs dict No Keyword arguments for the class-label scorer.
aggregator_kwargs varies No Additional keyword arguments for the aggregator.

Outputs

Name Type Description
scores np.ndarray 1D array of shape (N,) with overall quality scores for each example. Lower scores indicate more likely mislabeled examples.

Key Components

ClassLabelScorer Enum

Value Scoring Method Description
SELF_CONFIDENCE get_self_confidence_for_each_label Probability assigned to the given label by the model.
NORMALIZED_MARGIN get_normalized_margin_for_each_label Difference between the probability of the given label and the most likely alternative, normalized.
CONFIDENCE_WEIGHTED_ENTROPY get_confidence_weighted_entropy_for_each_label Entropy-based score weighted by model confidence.

Built-in Aggregators

Name Description
exponential_moving_average Sorts per-class scores in descending order and computes EMA. Default alpha=0.8 gives high weight to the worst-scoring class.
softmin Computes a soft minimum of per-class scores using softmax on (1 - scores) with a temperature parameter.

Usage Examples

Basic Usage

import numpy as np
from cleanlab.internal.multilabel_scorer import MultilabelScorer, ClassLabelScorer

labels = np.array([[0, 1, 0], [1, 0, 1], [1, 1, 0]])
pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9], [0.8, 0.7, 0.2]])

scorer = MultilabelScorer()
scores = scorer(labels, pred_probs)
print(f"Label quality scores: {scores}")

Custom Scorer and Aggregator

import numpy as np
from cleanlab.internal.multilabel_scorer import (
    MultilabelScorer,
    ClassLabelScorer,
    Aggregator,
)

labels = np.array([[0, 1, 0], [1, 0, 1]])
pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])

# Use normalized margin scoring with np.min aggregation
scorer = MultilabelScorer(
    base_scorer=ClassLabelScorer.NORMALIZED_MARGIN,
    aggregator=np.min,
)
scores = scorer(labels, pred_probs)
print(f"Scores with min aggregation: {scores}")

Function Interface

import numpy as np
import cleanlab.internal.multilabel_scorer as ml_scorer

labels = np.array([[0, 1, 0], [1, 0, 1]])
pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])

scores = ml_scorer.get_label_quality_scores(labels, pred_probs)
print(f"Scores: {scores}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment