Implementation:Cleanlab Cleanlab OD Get Label Quality Scores
| API | object_detection.rank.get_label_quality_scores
|
|---|---|
| Source | cleanlab/object_detection/rank.py:L50-57
|
| Domains | Machine_Learning, Data_Quality, Object_Detection |
| Last Updated | 2026-02-09 |
Overview
Implementation of per-image label quality scoring for object detection datasets. Computes a quality score between 0 and 1 for each image by evaluating overlooked, swapped, and badly located bounding box errors.
Description
This function takes ground truth bounding box annotations and model predictions for each image and returns a per-image quality score. It internally:
- Validates the input labels and predictions for correct format.
- Optionally checks for overlapping labels within the same image.
- Matches predicted boxes to ground truth boxes using IoU thresholds.
- Classifies unmatched or poorly matched boxes into three error categories: overlooked, swap, and badloc.
- Computes sub-scores for each error type using softmin aggregation.
- Combines sub-scores using configurable aggregation weights into a single per-image score.
Lower scores indicate images that are more likely to contain annotation errors.
Usage
This function is the primary entry point for scoring object detection label quality. It is typically called after training an object detection model and generating predictions on the training set. The resulting scores can be used to rank images for review or as input to the issue filtering function.
Code Reference
Source Location
cleanlab/object_detection/rank.py, lines 50-57.
Signature
def get_label_quality_scores(
labels: List[Dict[str, Any]],
predictions: List[np.ndarray],
*,
aggregation_weights: Optional[Dict[str, float]] = None,
overlapping_label_check: Optional[bool] = True,
verbose: bool = True,
) -> np.ndarray
Import
from cleanlab.object_detection.rank import get_label_quality_scores
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
labels |
List[Dict[str, Any]] |
List of N dictionaries, each with "bboxes" (np.ndarray of shape M,4 in xyxy format) and "labels" (np.ndarray of shape M, with integer class labels).
|
predictions |
List[np.ndarray] |
List of N arrays, each of shape (P, K+5) where P is the number of predicted boxes. Each row contains [x1, y1, x2, y2, confidence, class_0_prob, ..., class_K-1_prob]. |
aggregation_weights |
Optional[Dict[str, float]] |
Dictionary with keys "overlooked", "swap", "badloc" specifying weights for combining sub-scores. Defaults to equal weights if None.
|
overlapping_label_check |
Optional[bool] |
If True, checks for overlapping bounding boxes in the ground truth labels. Defaults to True. |
verbose |
bool |
If True, prints progress information. Defaults to True. |
Outputs
| Type | Description |
|---|---|
np.ndarray |
Array of shape (N,) containing per-image quality scores between 0 and 1. Lower scores indicate images more likely to have label issues. |
Usage Examples
import numpy as np
from cleanlab.object_detection.rank import get_label_quality_scores
# Ground truth labels for 2 images
labels = [
{
"bboxes": np.array([[10, 20, 50, 60], [100, 110, 200, 210]]),
"labels": np.array([0, 1]),
},
{
"bboxes": np.array([[30, 40, 70, 80]]),
"labels": np.array([2]),
},
]
# Model predictions for 2 images (K=3 classes, so each row has 3+5=8 columns)
predictions = [
np.array([
[10, 20, 50, 60, 0.9, 0.85, 0.10, 0.05],
[100, 110, 200, 210, 0.8, 0.05, 0.90, 0.05],
]),
np.array([
[30, 40, 70, 80, 0.95, 0.05, 0.10, 0.85],
]),
]
# Compute per-image quality scores
scores = get_label_quality_scores(labels, predictions)
# scores is a np.ndarray of shape (2,) with values between 0 and 1
# Use custom aggregation weights emphasizing overlooked errors
scores_custom = get_label_quality_scores(
labels,
predictions,
aggregation_weights={"overlooked": 0.5, "swap": 0.25, "badloc": 0.25},
)