Principle:Cleanlab Cleanlab Object Detection Quality Scoring

Knowledge Sources	ObjectLab, Cleanlab
Domains	Machine_Learning, Data_Quality, Object_Detection
Last Updated	2026-02-09

Overview

Method for computing per-image label quality scores in object detection datasets by evaluating three types of annotation errors: overlooked objects, swapped labels, and badly located bounding boxes.

Description

Object detection quality scoring uses the ObjectLab method to compare ground truth bounding boxes against model predictions via IoU (Intersection over Union) matching. For each image, it computes sub-scores for three error types:

Overlooked: Objects present in predictions but missing from ground truth labels, indicating the annotator failed to label a visible object.
Swap: A bounding box exists in the ground truth but has the wrong class label assigned to it, as indicated by the model predicting a different class for a matched box.
Badloc (Badly Located): A bounding box has the correct class label but is poorly placed relative to the actual object, resulting in low IoU between the ground truth box and the matched prediction.

These three sub-scores are aggregated via a weighted combination to produce a single per-image quality score between 0 and 1. Lower scores indicate images more likely to contain annotation errors.

Usage

Object detection quality scoring is applied after training an object detection model on the dataset. The model's predictions (bounding boxes with class probabilities) are compared against the ground truth annotations to identify potential labeling mistakes. This is useful for:

Dataset auditing: Systematically finding annotation errors in large object detection datasets.
Annotation quality assurance: Ranking images by label quality to prioritize human review.
Iterative dataset improvement: Identifying and correcting the most problematic annotations to improve model training.

Theoretical Basis

The scoring procedure follows these steps for each image:

Step 1: Box Matching. Match predicted bounding boxes to ground truth boxes using IoU thresholds. Each predicted box is matched to the ground truth box with the highest IoU, subject to a minimum IoU threshold.

Step 2: Error Classification.

For unmatched predictions (no corresponding ground truth box) -> overlooked errors.
For matched pairs where the predicted class differs from the ground truth class -> swap errors.
For matched pairs where the class is correct but IoU is low -> badloc errors.

Step 3: Sub-Score Computation. For each error type, compute a per-image sub-score using softmin aggregation over the individual error magnitudes. The softmin function provides a smooth approximation of the minimum that is differentiable and more robust to outliers:

softmin(scores) = sum(scores * exp(-scores / temperature)) / sum(exp(-scores / temperature))

Step 4: Weighted Aggregation. Combine the three sub-scores into a single per-image quality score:

score = w_overlooked * s_overlooked + w_swap * s_swap + w_badloc * s_badloc

where w_overlooked, w_swap, and w_badloc are configurable weights that control the relative importance of each error type. By default, all three error types are weighted equally.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment