Implementation:Cleanlab Cleanlab OD Get Label Quality Scores

API	`object_detection.rank.get_label_quality_scores`
Source	`cleanlab/object_detection/rank.py:L50-57`
Domains	Machine_Learning, Data_Quality, Object_Detection
Last Updated	2026-02-09

Overview

Implementation of per-image label quality scoring for object detection datasets. Computes a quality score between 0 and 1 for each image by evaluating overlooked, swapped, and badly located bounding box errors.

Description

This function takes ground truth bounding box annotations and model predictions for each image and returns a per-image quality score. It internally:

Validates the input labels and predictions for correct format.
Optionally checks for overlapping labels within the same image.
Matches predicted boxes to ground truth boxes using IoU thresholds.
Classifies unmatched or poorly matched boxes into three error categories: overlooked, swap, and badloc.
Computes sub-scores for each error type using softmin aggregation.
Combines sub-scores using configurable aggregation weights into a single per-image score.

Lower scores indicate images that are more likely to contain annotation errors.

Usage

This function is the primary entry point for scoring object detection label quality. It is typically called after training an object detection model and generating predictions on the training set. The resulting scores can be used to rank images for review or as input to the issue filtering function.

Code Reference

Source Location

cleanlab/object_detection/rank.py, lines 50-57.

Signature

def get_label_quality_scores(
    labels: List[Dict[str, Any]],
    predictions: List[np.ndarray],
    *,
    aggregation_weights: Optional[Dict[str, float]] = None,
    overlapping_label_check: Optional[bool] = True,
    verbose: bool = True,
) -> np.ndarray

Import

from cleanlab.object_detection.rank import get_label_quality_scores

I/O Contract

Inputs

Parameter	Type	Description
`labels`	`List[Dict[str, Any]]`	List of N dictionaries, each with `"bboxes"` (np.ndarray of shape M,4 in xyxy format) and `"labels"` (np.ndarray of shape M, with integer class labels).
`predictions`	`List[np.ndarray]`	List of N arrays, each of shape (P, K+5) where P is the number of predicted boxes. Each row contains [x1, y1, x2, y2, confidence, class_0_prob, ..., class_K-1_prob].
`aggregation_weights`	`Optional[Dict[str, float]]`	Dictionary with keys `"overlooked"`, `"swap"`, `"badloc"` specifying weights for combining sub-scores. Defaults to equal weights if None.
`overlapping_label_check`	`Optional[bool]`	If True, checks for overlapping bounding boxes in the ground truth labels. Defaults to True.
`verbose`	`bool`	If True, prints progress information. Defaults to True.

Outputs

Type	Description
`np.ndarray`	Array of shape (N,) containing per-image quality scores between 0 and 1. Lower scores indicate images more likely to have label issues.

Usage Examples

import numpy as np
from cleanlab.object_detection.rank import get_label_quality_scores

# Ground truth labels for 2 images
labels = [
    {
        "bboxes": np.array([[10, 20, 50, 60], [100, 110, 200, 210]]),
        "labels": np.array([0, 1]),
    },
    {
        "bboxes": np.array([[30, 40, 70, 80]]),
        "labels": np.array([2]),
    },
]

# Model predictions for 2 images (K=3 classes, so each row has 3+5=8 columns)
predictions = [
    np.array([
        [10, 20, 50, 60, 0.9, 0.85, 0.10, 0.05],
        [100, 110, 200, 210, 0.8, 0.05, 0.90, 0.05],
    ]),
    np.array([
        [30, 40, 70, 80, 0.95, 0.05, 0.10, 0.85],
    ]),
]

# Compute per-image quality scores
scores = get_label_quality_scores(labels, predictions)
# scores is a np.ndarray of shape (2,) with values between 0 and 1

# Use custom aggregation weights emphasizing overlooked errors
scores_custom = get_label_quality_scores(
    labels,
    predictions,
    aggregation_weights={"overlooked": 0.5, "swap": 0.25, "badloc": 0.25},
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment