Implementation:Cleanlab Cleanlab Multilabel Get Label Quality Scores

Knowledge Sources	Cleanlab
Domains	Multi-Label Classification, Data Quality, Label Scoring
Last Updated	2026-02-09 00:00 GMT

Overview

Computes a label quality score for each example in a multi-label classification dataset, quantifying how likely each example's set of class annotations is correct.

Description

The get_label_quality_scores function in the multilabel classification rank module computes per-example quality scores between 0 and 1, where lower scores indicate examples whose labels more likely contain annotation errors. It handles the multi-label setting where each example can belong to zero, one, or multiple classes simultaneously and model-predicted probabilities need not sum to 1 across classes.

The function works by first converting multi-label lists to binary one-hot representations, then computing separate quality scores for each class using a configurable scoring method (e.g., self_confidence, normalized_margin, or confidence_weighted_entropy) via a one-vs-rest approach. These per-class scores are then aggregated into a single example-level score using an Aggregator (default: exponential moving average with alpha=0.8). A companion function, get_label_quality_scores_per_class, returns the unaggregated per-class scores.

Usage

Import this function when you have a multi-label classification dataset and want to identify which examples are most likely to have incorrect label annotations. Use it to rank examples by annotation quality for data cleaning or review prioritization. This is the core ranking function exported at the package level via cleanlab.multilabel_classification.

Code Reference

Source Location

Repository: Cleanlab
File: cleanlab/multilabel_classification/rank.py
Lines: 53-121 (get_label_quality_scores), 124-179 (get_label_quality_scores_per_class)

Signature

def get_label_quality_scores(
    labels: List[List[int]],
    pred_probs: npt.NDArray["np.floating[T]"],
    *,
    method: str = "self_confidence",
    adjust_pred_probs: bool = False,
    aggregator_kwargs: Dict[str, Any] = {"method": "exponential_moving_average", "alpha": 0.8},
) -> npt.NDArray["np.floating[T]"]:

def get_label_quality_scores_per_class(
    labels: List[List[int]],
    pred_probs: npt.NDArray["np.floating[T]"],
    *,
    method: str = "self_confidence",
    adjust_pred_probs: bool = False,
) -> np.ndarray:

Import

from cleanlab.multilabel_classification.rank import get_label_quality_scores
from cleanlab.multilabel_classification.rank import get_label_quality_scores_per_class

I/O Contract

Inputs (get_label_quality_scores)

Name	Type	Required	Description
labels	List[List[int]]	Yes	List of noisy multi-label annotations. Each inner list contains the class indices that apply to that example (e.g., [[1], [0, 2]] means example 0 has class 1 and example 1 has classes 0 and 2).
pred_probs	np.ndarray	Yes	Array of shape (N, K) with model-predicted class probabilities, where N is the number of examples and K is the number of classes. Probabilities need not sum to 1 per row.
method	str	No (default: "self_confidence")	Scoring method for per-class annotation scores. Options: "self_confidence", "normalized_margin", "confidence_weighted_entropy".
adjust_pred_probs	bool	No (default: False)	Whether to adjust predicted probabilities to account for class imbalance.
aggregator_kwargs	Dict[str, Any]	No (default: {"method": "exponential_moving_average", "alpha": 0.8})	Hyperparameters for aggregating per-class scores. Options for "method": "exponential_moving_average", "softmin", or a custom callable.

Outputs (get_label_quality_scores)

Name	Type	Description
label_quality_scores	np.ndarray	1D array of shape (N,) with quality scores between 0 and 1. Lower scores indicate examples more likely to contain annotation errors.

Inputs (get_label_quality_scores_per_class)

Name	Type	Required	Description
labels	List[List[int]]	Yes	Multi-label annotations (same format as above)
pred_probs	np.ndarray	Yes	Model predictions of shape (N, K) (same format as above)
method	str	No (default: "self_confidence")	Scoring method for per-class annotation scores
adjust_pred_probs	bool	No (default: False)	Whether to adjust for class imbalance

Outputs (get_label_quality_scores_per_class)

Name	Type	Description
label_quality_scores	list(np.ndarray)	List of K arrays, each of shape (N,). label_quality_scores[k][i] is the quality score for class k's annotation on example i.

Internal Pipeline

The scoring pipeline consists of three stages:

Validation and Conversion: Input labels are validated via assert_valid_inputs and converted from list-of-lists format to binary one-hot representation using int2onehot.
Per-Class Scoring: A MultilabelScorer is created via the factory function _create_multilabel_scorer, which wraps a ClassLabelScorer (specifying the scoring method) and an Aggregator (specifying how to combine per-class scores).
Aggregation: Per-class binary classification quality scores are combined into a single example-level score using the configured aggregation method.

Usage Examples

Basic Usage

from cleanlab.multilabel_classification.rank import get_label_quality_scores
import numpy as np

# Example: 2 examples, 3 classes
labels = [[1], [0, 2]]
pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])

scores = get_label_quality_scores(labels, pred_probs)
print(scores)  # array([0.9, 0.5])

Per-Class Scores

from cleanlab.multilabel_classification.rank import get_label_quality_scores_per_class
import numpy as np

labels = [[1], [0, 2]]
pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])

per_class_scores = get_label_quality_scores_per_class(labels, pred_probs)
# Returns list of 3 arrays (one per class), each of length 2 (one per example)

Custom Aggregation

from cleanlab.multilabel_classification.rank import get_label_quality_scores
import numpy as np

labels = [[1], [0, 2]]
pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])

# Use softmin aggregation instead of default EMA
scores = get_label_quality_scores(
    labels, pred_probs,
    method="normalized_margin",
    aggregator_kwargs={"method": "softmin"},
)

Related Pages

Principle:Cleanlab_Cleanlab_Multilabel_Label_Quality_Scoring

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment