Implementation:Cleanlab Cleanlab Segmentation Get Label Quality Scores
| Knowledge Sources | |
|---|---|
| Domains | Data Quality, Machine Learning, Computer Vision, Semantic Segmentation |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Computes per-image and per-pixel label quality scores for semantic segmentation datasets, and provides a utility to convert scores into binary issue masks.
Description
The cleanlab/segmentation/rank.py module contains two public functions:
get_label_quality_scores computes continuous quality scores at both the image level and the pixel level. It supports two scoring methods: the default "softmin" method, which extracts the predicted probability for each pixel's given class and aggregates to image-level scores using a temperature-controlled softmin function, and the "num_pixel_issues" method, which delegates to find_label_issues to count pixel-level issues and derives image scores from the fraction of non-issue pixels.
issues_from_scores converts the scores produced by get_label_quality_scores into binary issue masks by applying a user-specified threshold. Pixels with quality scores below the threshold are marked as issues. If only image-level scores are provided (without pixel scores), it returns sorted indices of images whose scores fall below the threshold.
Usage
Import get_label_quality_scores when you need continuous quality scores for ranking and prioritizing which images or pixels to review. Import issues_from_scores when you want to convert those scores into a binary issue mask using a custom threshold, especially for use with visualization utilities like display_issues.
Code Reference
Source Location
- Repository: Cleanlab
- File: cleanlab/segmentation/rank.py
- Lines: 14-131 (get_label_quality_scores), 133-186 (issues_from_scores)
Signature (get_label_quality_scores)
def get_label_quality_scores(
labels: np.ndarray,
pred_probs: np.ndarray,
*,
method: str = "softmin",
batch_size: Optional[int] = None,
n_jobs: Optional[int] = None,
verbose: bool = True,
**kwargs,
) -> Tuple[np.ndarray, np.ndarray]:
Signature (issues_from_scores)
def issues_from_scores(
image_scores: np.ndarray,
pixel_scores: Optional[np.ndarray] = None,
threshold: float = 0.1,
) -> np.ndarray:
Import
from cleanlab.segmentation.rank import get_label_quality_scores, issues_from_scores
I/O Contract
Inputs (get_label_quality_scores)
| Name | Type | Required | Description |
|---|---|---|---|
| labels | np.ndarray | Yes | Discrete array of shape (N, H, W) of integer class labels for each pixel, with values in 0, 1, ..., K-1. |
| pred_probs | np.ndarray | Yes | Array of shape (N, K, H, W) of model-predicted class probabilities for each pixel. |
| method | str | No | Scoring method: "softmin" (default) extracts per-pixel predicted probabilities and aggregates with softmin; "num_pixel_issues" counts detected issues per image via find_label_issues. |
| batch_size | Optional[int] | No | Mini-batch size for the "num_pixel_issues" method. No effect on "softmin". |
| n_jobs | Optional[int] | No | Number of processes for multiprocessing (Linux only). Only used with "num_pixel_issues". |
| verbose | bool | No | If True (default), displays progress bars. |
| temperature | float (via kwargs) | No | Temperature parameter for the softmin aggregation. Default is 0.1. Lower values emphasize the worst pixel; higher values approach the mean. |
| downsample | int (via kwargs) | No | Downsampling factor for "num_pixel_issues" method. Default is 1. |
Outputs (get_label_quality_scores)
| Name | Type | Description |
|---|---|---|
| image_scores | np.ndarray | Array of shape (N,) with scores between 0 and 1, one per image. Lower scores indicate images more likely to contain label issues. |
| pixel_scores | np.ndarray | Array of shape (N, H, W) with scores between 0 and 1, one per pixel. Lower scores indicate pixels more likely to be mislabeled. |
Inputs (issues_from_scores)
| Name | Type | Required | Description |
|---|---|---|---|
| image_scores | np.ndarray | Yes | Array of shape (N,) of per-image quality scores. |
| pixel_scores | Optional[np.ndarray] | No | Array of shape (N, H, W) of per-pixel quality scores. If provided, returns a boolean mask; otherwise, returns sorted image indices. |
| threshold | float | No | Quality score cutoff (default 0.1). Pixels or images with scores below this value are marked as issues. |
Outputs (issues_from_scores)
| Name | Type | Description |
|---|---|---|
| issues | np.ndarray | If pixel_scores is provided: boolean mask of shape (N, H, W) where True indicates an issue. If pixel_scores is None: array of integer indices of images with scores below the threshold, sorted by score. |
Scoring Methods Detail
Softmin Method (Default)
For each image, the per-pixel score is the model's predicted probability for the given label class at that pixel: pixel_score[i, h, w] = pred_probs[i, labels[i, h, w], h, w]. The image-level score is computed by applying a softmin aggregation (inner product of pixel scores with softmax(1 - pixel_scores)) controlled by a temperature parameter. Lower temperature values cause the image score to be dominated by the worst pixel, while higher temperatures yield scores closer to the average.
Num Pixel Issues Method
Per-pixel scores are computed by masking pred_probs to extract the probability for each pixel's given class. The image-level score is 1 - mean(issue_mask), where issue_mask is the boolean mask returned by find_label_issues. Images with more flagged pixels receive lower image-level scores.
Usage Examples
Basic Usage
import numpy as np
from cleanlab.segmentation.rank import get_label_quality_scores
# N=5 images, K=3 classes, H=32, W=32
labels = np.random.randint(0, 3, size=(5, 32, 32))
pred_probs = np.random.dirichlet([1, 1, 1], size=(5, 32, 32))
pred_probs = np.transpose(pred_probs, (0, 3, 1, 2))
image_scores, pixel_scores = get_label_quality_scores(
labels, pred_probs, verbose=False
)
print(f"Image scores shape: {image_scores.shape}") # (5,)
print(f"Pixel scores shape: {pixel_scores.shape}") # (5, 32, 32)
Converting Scores to Issues
from cleanlab.segmentation.rank import get_label_quality_scores, issues_from_scores
image_scores, pixel_scores = get_label_quality_scores(
labels, pred_probs, verbose=False
)
# Get boolean mask of pixel-level issues with threshold 0.1
issue_mask = issues_from_scores(image_scores, pixel_scores, threshold=0.1)
print(f"Total pixel issues: {issue_mask.sum()}")
# Get indices of problematic images (without pixel scores)
problem_images = issues_from_scores(image_scores, threshold=0.2)
print(f"Problem image indices: {problem_images}")