Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Cleanlab Cleanlab Get Label Quality Scores

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Data_Quality
Last Updated 2026-02-09 19:00 GMT

Overview

Concrete tool for computing per-example label quality scores that quantify the likelihood each given label is correct, provided by the Cleanlab library.

Description

This function takes noisy labels and out-of-sample predicted probabilities and returns a numeric quality score for each example. The score is between 0 and 1, where lower values indicate labels that are more likely to be incorrect. Three scoring methods are available via the method parameter: self_confidence (predicted probability of the given label), normalized_margin (gap between given label probability and the next best class), and confidence_weighted_entropy (uncertainty-weighted confidence). An optional adjust_pred_probs parameter can be used to modify the predicted probabilities to account for class imbalance before scoring.

Usage

Import and use this function when you need continuous quality scores for all examples in your dataset. This is useful for ranking examples by label quality, setting custom thresholds for flagging issues, or providing scores to downstream functions like order_label_issues. This function is commonly used after or alongside find_label_issues to provide complementary information.

Code Reference

Source Location

  • Repository: cleanlab
  • File: cleanlab/rank.py
  • Lines: 33-117

Signature

def get_label_quality_scores(
    labels: np.ndarray,
    pred_probs: np.ndarray,
    *,
    method: str = "self_confidence",
    adjust_pred_probs: bool = False,
) -> np.ndarray

Import

from cleanlab.rank import get_label_quality_scores

I/O Contract

Inputs

Name Type Required Description
labels np.ndarray Yes Array of noisy class labels of shape (N,) with integer values 0..K-1.
pred_probs np.ndarray Yes Out-of-sample predicted probability matrix of shape (N, K). Each row sums to 1.
method str No Scoring method to use. One of "self_confidence" (default), "normalized_margin", or "confidence_weighted_entropy".
adjust_pred_probs bool No If True, adjust predicted probabilities to account for class imbalance before computing scores. Defaults to False.

Outputs

Name Type Description
label_quality_scores np.ndarray Array of shape (N,) with quality scores between 0 and 1 for each example. Lower scores indicate labels more likely to be incorrect.

Usage Examples

Basic Usage

import numpy as np
from cleanlab.rank import get_label_quality_scores

labels = np.array([0, 0, 1, 1, 2, 2])
pred_probs = np.array([
    [0.9, 0.05, 0.05],
    [0.2, 0.7, 0.1],   # labeled 0 but model thinks 1
    [0.1, 0.8, 0.1],
    [0.05, 0.1, 0.85],  # labeled 1 but model thinks 2
    [0.1, 0.1, 0.8],
    [0.05, 0.05, 0.9],
])

# Default: self_confidence
scores = get_label_quality_scores(labels, pred_probs)
print("Quality scores:", scores)
# Example output: [0.9, 0.2, 0.8, 0.1, 0.8, 0.9]
# Lower scores for examples 1 and 3 (likely mislabeled)

Comparing Scoring Methods

from cleanlab.rank import get_label_quality_scores

# Self-confidence: P(given_label | x)
scores_sc = get_label_quality_scores(labels, pred_probs, method="self_confidence")

# Normalized margin: P(given_label) - P(next best class)
scores_nm = get_label_quality_scores(labels, pred_probs, method="normalized_margin")

# Confidence-weighted entropy
scores_cwe = get_label_quality_scores(
    labels, pred_probs, method="confidence_weighted_entropy"
)

# All methods rank examples similarly, but with different score distributions
for i in range(len(labels)):
    print(f"Example {i}: SC={scores_sc[i]:.3f}, NM={scores_nm[i]:.3f}, CWE={scores_cwe[i]:.3f}")

Identifying Worst Labels

import numpy as np
from cleanlab.rank import get_label_quality_scores

scores = get_label_quality_scores(labels, pred_probs)

# Get the 3 worst-scoring examples
worst_indices = np.argsort(scores)[:3]
print("Worst labels at indices:", worst_indices)
print("Their scores:", scores[worst_indices])

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment