Principle:Cleanlab Cleanlab Object Detection Issue Filtering
| Knowledge Sources | Cleanlab |
|---|---|
| Domains | Machine_Learning, Data_Quality, Object_Detection |
| Last Updated | 2026-02-09 |
Overview
Method for identifying which images in an object detection dataset have label issues based on quality score thresholds.
Description
Object detection issue filtering determines which images have annotation errors by applying a threshold to the per-image quality scores computed by the scoring method. Images whose quality scores fall below the threshold are flagged as having label issues. This enables batch identification of problematic images for review.
The method operates in two modes:
- Boolean mask mode (default): Returns a boolean array of length N where True indicates the image has a label issue. This is useful for filtering datasets.
- Ranked index mode: Returns an array of image indices sorted by quality score (ascending), so the most problematic images appear first. This is useful for prioritized human review.
Internally, the function delegates to the quality scoring function to compute per-image scores, then applies thresholding or ranking logic to produce the final output.
Usage
Object detection issue filtering is used when the goal is to obtain a binary determination of which images have label issues, rather than continuous quality scores. Typical use cases include:
- Dataset cleaning: Filtering out images with label issues before training a production model.
- Prioritized review: Generating a ranked list of images for human reviewers to inspect in order of severity.
- Quality gates: Establishing a minimum quality threshold for an object detection dataset.
Theoretical Basis
The filtering procedure operates as follows:
Step 1: Score Computation. Compute per-image quality scores using the ObjectLab scoring method (see Principle:Cleanlab_Cleanlab_Object_Detection_Quality_Scoring). Each image receives a score between 0 and 1.
Step 2: Threshold Determination. Derive a threshold from the distribution of scores across the dataset. The threshold is computed to identify a statistically meaningful set of outlier images whose scores are significantly lower than the dataset average.
Step 3: Classification. Compare each image's score against the threshold:
is_issue(image_i) = (score_i < threshold)
Step 4 (Optional): Ranking. If ranked output is requested, sort images by their quality scores in ascending order and return the sorted indices:
ranked_indices = argsort(scores) # ascending, worst first