Implementation:Cleanlab Cleanlab Get Label Quality Scores
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Data_Quality |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Concrete tool for computing per-example label quality scores that quantify the likelihood each given label is correct, provided by the Cleanlab library.
Description
This function takes noisy labels and out-of-sample predicted probabilities and returns a numeric quality score for each example. The score is between 0 and 1, where lower values indicate labels that are more likely to be incorrect. Three scoring methods are available via the method parameter: self_confidence (predicted probability of the given label), normalized_margin (gap between given label probability and the next best class), and confidence_weighted_entropy (uncertainty-weighted confidence). An optional adjust_pred_probs parameter can be used to modify the predicted probabilities to account for class imbalance before scoring.
Usage
Import and use this function when you need continuous quality scores for all examples in your dataset. This is useful for ranking examples by label quality, setting custom thresholds for flagging issues, or providing scores to downstream functions like order_label_issues. This function is commonly used after or alongside find_label_issues to provide complementary information.
Code Reference
Source Location
- Repository: cleanlab
- File: cleanlab/rank.py
- Lines: 33-117
Signature
def get_label_quality_scores(
labels: np.ndarray,
pred_probs: np.ndarray,
*,
method: str = "self_confidence",
adjust_pred_probs: bool = False,
) -> np.ndarray
Import
from cleanlab.rank import get_label_quality_scores
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| labels | np.ndarray | Yes | Array of noisy class labels of shape (N,) with integer values 0..K-1. |
| pred_probs | np.ndarray | Yes | Out-of-sample predicted probability matrix of shape (N, K). Each row sums to 1. |
| method | str | No | Scoring method to use. One of "self_confidence" (default), "normalized_margin", or "confidence_weighted_entropy". |
| adjust_pred_probs | bool | No | If True, adjust predicted probabilities to account for class imbalance before computing scores. Defaults to False. |
Outputs
| Name | Type | Description |
|---|---|---|
| label_quality_scores | np.ndarray | Array of shape (N,) with quality scores between 0 and 1 for each example. Lower scores indicate labels more likely to be incorrect. |
Usage Examples
Basic Usage
import numpy as np
from cleanlab.rank import get_label_quality_scores
labels = np.array([0, 0, 1, 1, 2, 2])
pred_probs = np.array([
[0.9, 0.05, 0.05],
[0.2, 0.7, 0.1], # labeled 0 but model thinks 1
[0.1, 0.8, 0.1],
[0.05, 0.1, 0.85], # labeled 1 but model thinks 2
[0.1, 0.1, 0.8],
[0.05, 0.05, 0.9],
])
# Default: self_confidence
scores = get_label_quality_scores(labels, pred_probs)
print("Quality scores:", scores)
# Example output: [0.9, 0.2, 0.8, 0.1, 0.8, 0.9]
# Lower scores for examples 1 and 3 (likely mislabeled)
Comparing Scoring Methods
from cleanlab.rank import get_label_quality_scores
# Self-confidence: P(given_label | x)
scores_sc = get_label_quality_scores(labels, pred_probs, method="self_confidence")
# Normalized margin: P(given_label) - P(next best class)
scores_nm = get_label_quality_scores(labels, pred_probs, method="normalized_margin")
# Confidence-weighted entropy
scores_cwe = get_label_quality_scores(
labels, pred_probs, method="confidence_weighted_entropy"
)
# All methods rank examples similarly, but with different score distributions
for i in range(len(labels)):
print(f"Example {i}: SC={scores_sc[i]:.3f}, NM={scores_nm[i]:.3f}, CWE={scores_cwe[i]:.3f}")
Identifying Worst Labels
import numpy as np
from cleanlab.rank import get_label_quality_scores
scores = get_label_quality_scores(labels, pred_probs)
# Get the 3 worst-scoring examples
worst_indices = np.argsort(scores)[:3]
print("Worst labels at indices:", worst_indices)
print("Their scores:", scores[worst_indices])