Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Cleanlab Cleanlab Multilabel Get Label Quality Scores

From Leeroopedia
Revision as of 14:36, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Cleanlab_Cleanlab_Multilabel_Get_Label_Quality_Scores.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Multi-Label Classification, Data Quality, Label Scoring
Last Updated 2026-02-09 00:00 GMT

Overview

Computes a label quality score for each example in a multi-label classification dataset, quantifying how likely each example's set of class annotations is correct.

Description

The get_label_quality_scores function in the multilabel classification rank module computes per-example quality scores between 0 and 1, where lower scores indicate examples whose labels more likely contain annotation errors. It handles the multi-label setting where each example can belong to zero, one, or multiple classes simultaneously and model-predicted probabilities need not sum to 1 across classes.

The function works by first converting multi-label lists to binary one-hot representations, then computing separate quality scores for each class using a configurable scoring method (e.g., self_confidence, normalized_margin, or confidence_weighted_entropy) via a one-vs-rest approach. These per-class scores are then aggregated into a single example-level score using an Aggregator (default: exponential moving average with alpha=0.8). A companion function, get_label_quality_scores_per_class, returns the unaggregated per-class scores.

Usage

Import this function when you have a multi-label classification dataset and want to identify which examples are most likely to have incorrect label annotations. Use it to rank examples by annotation quality for data cleaning or review prioritization. This is the core ranking function exported at the package level via cleanlab.multilabel_classification.

Code Reference

Source Location

  • Repository: Cleanlab
  • File: cleanlab/multilabel_classification/rank.py
  • Lines: 53-121 (get_label_quality_scores), 124-179 (get_label_quality_scores_per_class)

Signature

def get_label_quality_scores(
    labels: List[List[int]],
    pred_probs: npt.NDArray["np.floating[T]"],
    *,
    method: str = "self_confidence",
    adjust_pred_probs: bool = False,
    aggregator_kwargs: Dict[str, Any] = {"method": "exponential_moving_average", "alpha": 0.8},
) -> npt.NDArray["np.floating[T]"]:
def get_label_quality_scores_per_class(
    labels: List[List[int]],
    pred_probs: npt.NDArray["np.floating[T]"],
    *,
    method: str = "self_confidence",
    adjust_pred_probs: bool = False,
) -> np.ndarray:

Import

from cleanlab.multilabel_classification.rank import get_label_quality_scores
from cleanlab.multilabel_classification.rank import get_label_quality_scores_per_class

I/O Contract

Inputs (get_label_quality_scores)

Name Type Required Description
labels List[List[int]] Yes List of noisy multi-label annotations. Each inner list contains the class indices that apply to that example (e.g., [[1], [0, 2]] means example 0 has class 1 and example 1 has classes 0 and 2).
pred_probs np.ndarray Yes Array of shape (N, K) with model-predicted class probabilities, where N is the number of examples and K is the number of classes. Probabilities need not sum to 1 per row.
method str No (default: "self_confidence") Scoring method for per-class annotation scores. Options: "self_confidence", "normalized_margin", "confidence_weighted_entropy".
adjust_pred_probs bool No (default: False) Whether to adjust predicted probabilities to account for class imbalance.
aggregator_kwargs Dict[str, Any] No (default: {"method": "exponential_moving_average", "alpha": 0.8}) Hyperparameters for aggregating per-class scores. Options for "method": "exponential_moving_average", "softmin", or a custom callable.

Outputs (get_label_quality_scores)

Name Type Description
label_quality_scores np.ndarray 1D array of shape (N,) with quality scores between 0 and 1. Lower scores indicate examples more likely to contain annotation errors.

Inputs (get_label_quality_scores_per_class)

Name Type Required Description
labels List[List[int]] Yes Multi-label annotations (same format as above)
pred_probs np.ndarray Yes Model predictions of shape (N, K) (same format as above)
method str No (default: "self_confidence") Scoring method for per-class annotation scores
adjust_pred_probs bool No (default: False) Whether to adjust for class imbalance

Outputs (get_label_quality_scores_per_class)

Name Type Description
label_quality_scores list(np.ndarray) List of K arrays, each of shape (N,). label_quality_scores[k][i] is the quality score for class k's annotation on example i.

Internal Pipeline

The scoring pipeline consists of three stages:

  1. Validation and Conversion: Input labels are validated via assert_valid_inputs and converted from list-of-lists format to binary one-hot representation using int2onehot.
  2. Per-Class Scoring: A MultilabelScorer is created via the factory function _create_multilabel_scorer, which wraps a ClassLabelScorer (specifying the scoring method) and an Aggregator (specifying how to combine per-class scores).
  3. Aggregation: Per-class binary classification quality scores are combined into a single example-level score using the configured aggregation method.

Usage Examples

Basic Usage

from cleanlab.multilabel_classification.rank import get_label_quality_scores
import numpy as np

# Example: 2 examples, 3 classes
labels = [[1], [0, 2]]
pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])

scores = get_label_quality_scores(labels, pred_probs)
print(scores)  # array([0.9, 0.5])

Per-Class Scores

from cleanlab.multilabel_classification.rank import get_label_quality_scores_per_class
import numpy as np

labels = [[1], [0, 2]]
pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])

per_class_scores = get_label_quality_scores_per_class(labels, pred_probs)
# Returns list of 3 arrays (one per class), each of length 2 (one per example)

Custom Aggregation

from cleanlab.multilabel_classification.rank import get_label_quality_scores
import numpy as np

labels = [[1], [0, 2]]
pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])

# Use softmin aggregation instead of default EMA
scores = get_label_quality_scores(
    labels, pred_probs,
    method="normalized_margin",
    aggregator_kwargs={"method": "softmin"},
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment