Overview
Implements the scoring pipeline for detecting label issues in multi-label classification, where each example can belong to multiple classes simultaneously.
Description
The multilabel_scorer module provides a decompose-then-aggregate pattern for computing label quality scores in multi-label classification. The ClassLabelScorer enum wraps three binary scoring methods (self-confidence, normalized margin, confidence weighted entropy) from cleanlab.rank using a _Wrapper helper class. For each class, the multi-label problem is treated as an independent binary classification task, where predicted probabilities are converted to two-column format via stack_complement. The Aggregator class reduces per-class scores (shape N x K) to a single score per example (shape N) using configurable aggregation functions. Two built-in aggregators are provided: exponential_moving_average (sorts scores in descending order and applies EMA with a forgetting factor alpha) and softmin (weighted soft minimum via softmax on negated scores). The MultilabelScorer class orchestrates the full pipeline, while get_label_quality_scores provides a simple function interface. Additional utilities include multilabel_py for computing per-class label priors and get_cross_validated_multilabel_pred_probs for obtaining out-of-sample predictions.
Usage
Import this module when working with multi-label classification datasets where each example can have multiple labels, and you need to score label quality or detect label issues. It is used internally by cleanlab's higher-level APIs for multi-label data quality analysis.
Code Reference
Source Location
- Repository: Cleanlab
- File: cleanlab/internal/multilabel_scorer.py
- Lines: 1-653
Signature
class MultilabelScorer:
def __init__(
self,
base_scorer: ClassLabelScorer = ClassLabelScorer.SELF_CONFIDENCE,
aggregator: Union[Aggregator, Callable] = Aggregator(
exponential_moving_average, alpha=0.8
),
*,
strict: bool = True,
)
def get_label_quality_scores(
labels,
pred_probs,
*,
method: MultilabelScorer = MultilabelScorer(),
base_scorer_kwargs: Optional[dict] = None,
**aggregator_kwargs,
) -> np.ndarray
class ClassLabelScorer(Enum):
SELF_CONFIDENCE = _Wrapper(get_self_confidence_for_each_label)
NORMALIZED_MARGIN = _Wrapper(get_normalized_margin_for_each_label)
CONFIDENCE_WEIGHTED_ENTROPY = _Wrapper(get_confidence_weighted_entropy_for_each_label)
Import
from cleanlab.internal.multilabel_scorer import (
MultilabelScorer,
ClassLabelScorer,
Aggregator,
get_label_quality_scores,
exponential_moving_average,
softmin,
multilabel_py,
get_cross_validated_multilabel_pred_probs,
)
I/O Contract
Inputs (MultilabelScorer.__call__)
| Name |
Type |
Required |
Description
|
| labels |
np.ndarray |
Yes |
2D binary array of shape (N, K) where N is the number of samples and K is the number of classes.
|
| pred_probs |
np.ndarray |
Yes |
2D array of shape (N, K) with predicted probabilities for each class. Values do not need to sum to 1 across classes.
|
| base_scorer_kwargs |
dict |
No |
Keyword arguments passed to the base scoring function (e.g., adjust_pred_probs).
|
| aggregator_kwargs |
dict |
No |
Additional keyword arguments passed to the aggregation function (e.g., alpha for EMA).
|
Inputs (get_label_quality_scores)
| Name |
Type |
Required |
Description
|
| labels |
np.ndarray |
Yes |
2D binary array of shape (N, K).
|
| pred_probs |
np.ndarray |
Yes |
2D array of shape (N, K) with predicted probabilities.
|
| method |
MultilabelScorer |
No |
Scoring and aggregation method. Default uses SELF_CONFIDENCE with EMA(alpha=0.8).
|
| base_scorer_kwargs |
dict |
No |
Keyword arguments for the class-label scorer.
|
| aggregator_kwargs |
varies |
No |
Additional keyword arguments for the aggregator.
|
Outputs
| Name |
Type |
Description
|
| scores |
np.ndarray |
1D array of shape (N,) with overall quality scores for each example. Lower scores indicate more likely mislabeled examples.
|
Key Components
ClassLabelScorer Enum
| Value |
Scoring Method |
Description
|
| SELF_CONFIDENCE |
get_self_confidence_for_each_label |
Probability assigned to the given label by the model.
|
| NORMALIZED_MARGIN |
get_normalized_margin_for_each_label |
Difference between the probability of the given label and the most likely alternative, normalized.
|
| CONFIDENCE_WEIGHTED_ENTROPY |
get_confidence_weighted_entropy_for_each_label |
Entropy-based score weighted by model confidence.
|
Built-in Aggregators
| Name |
Description
|
| exponential_moving_average |
Sorts per-class scores in descending order and computes EMA. Default alpha=0.8 gives high weight to the worst-scoring class.
|
| softmin |
Computes a soft minimum of per-class scores using softmax on (1 - scores) with a temperature parameter.
|
Usage Examples
Basic Usage
import numpy as np
from cleanlab.internal.multilabel_scorer import MultilabelScorer, ClassLabelScorer
labels = np.array([[0, 1, 0], [1, 0, 1], [1, 1, 0]])
pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9], [0.8, 0.7, 0.2]])
scorer = MultilabelScorer()
scores = scorer(labels, pred_probs)
print(f"Label quality scores: {scores}")
Custom Scorer and Aggregator
import numpy as np
from cleanlab.internal.multilabel_scorer import (
MultilabelScorer,
ClassLabelScorer,
Aggregator,
)
labels = np.array([[0, 1, 0], [1, 0, 1]])
pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])
# Use normalized margin scoring with np.min aggregation
scorer = MultilabelScorer(
base_scorer=ClassLabelScorer.NORMALIZED_MARGIN,
aggregator=np.min,
)
scores = scorer(labels, pred_probs)
print(f"Scores with min aggregation: {scores}")
Function Interface
import numpy as np
import cleanlab.internal.multilabel_scorer as ml_scorer
labels = np.array([[0, 1, 0], [1, 0, 1]])
pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])
scores = ml_scorer.get_label_quality_scores(labels, pred_probs)
print(f"Scores: {scores}")
Related Pages