Principle:Cleanlab Cleanlab Object Detection Issue Filtering

Knowledge Sources	Cleanlab
Domains	Machine_Learning, Data_Quality, Object_Detection
Last Updated	2026-02-09

Overview

Method for identifying which images in an object detection dataset have label issues based on quality score thresholds.

Description

Object detection issue filtering determines which images have annotation errors by applying a threshold to the per-image quality scores computed by the scoring method. Images whose quality scores fall below the threshold are flagged as having label issues. This enables batch identification of problematic images for review.

The method operates in two modes:

Boolean mask mode (default): Returns a boolean array of length N where True indicates the image has a label issue. This is useful for filtering datasets.
Ranked index mode: Returns an array of image indices sorted by quality score (ascending), so the most problematic images appear first. This is useful for prioritized human review.

Internally, the function delegates to the quality scoring function to compute per-image scores, then applies thresholding or ranking logic to produce the final output.

Usage

Object detection issue filtering is used when the goal is to obtain a binary determination of which images have label issues, rather than continuous quality scores. Typical use cases include:

Dataset cleaning: Filtering out images with label issues before training a production model.
Prioritized review: Generating a ranked list of images for human reviewers to inspect in order of severity.
Quality gates: Establishing a minimum quality threshold for an object detection dataset.

Theoretical Basis

The filtering procedure operates as follows:

Step 1: Score Computation. Compute per-image quality scores using the ObjectLab scoring method (see Principle:Cleanlab_Cleanlab_Object_Detection_Quality_Scoring). Each image receives a score between 0 and 1.

Step 2: Threshold Determination. Derive a threshold from the distribution of scores across the dataset. The threshold is computed to identify a statistically meaningful set of outlier images whose scores are significantly lower than the dataset average.

Step 3: Classification. Compare each image's score against the threshold:

is_issue(image_i) = (score_i < threshold)

Step 4 (Optional): Ranking. If ranked output is requested, sort images by their quality scores in ascending order and return the sorted indices:

ranked_indices = argsort(scores)  # ascending, worst first

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment