Workflow:Obss Sahi COCO Evaluation
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, Model_Evaluation, Object_Detection |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
End-to-end process for evaluating object detection predictions against ground truth annotations using COCO evaluation metrics, including classwise AP/AR computation and optional TIDE-based error analysis with exported plots.
Description
This workflow takes a COCO-format ground truth dataset and a COCO-format result JSON of predictions, then computes standard COCO evaluation metrics (Average Precision and Average Recall) across IoU thresholds and area sizes. It supports classwise metric breakdown, configurable maximum detections, and custom IoU thresholds. An optional error analysis step uses the TIDE framework to categorize detection errors (classification, localization, duplicate, background, missed) and exports diagnostic plots.
Usage
Execute this workflow after running inference (sliced or standard) with SAHI's prediction pipeline and exporting results in COCO JSON format. It is the standard way to quantitatively assess detection quality, compare model configurations, and diagnose error patterns. The input is a COCO dataset JSON (ground truth) and a COCO result JSON (predictions). The output is printed AP/AR metrics and optionally a directory of error analysis plots.
Execution Steps
Step 1: Load Ground Truth and Predictions
Load the COCO ground truth dataset JSON using pycocotools' COCO class. Load the prediction result JSON and associate it with the ground truth via COCOeval. Both files must share consistent image IDs and category IDs so predictions can be matched to their corresponding ground truth annotations.
Key considerations:
- pycocotools must be installed (optional dependency of SAHI)
- The prediction JSON must follow COCO result format: list of dicts with image_id, category_id, bbox, and score
- Image IDs in the result must exist in the ground truth dataset
Step 2: Configure Evaluation Parameters
Set up the evaluation parameters including the metric type (bbox or segm), IoU thresholds, area ranges, and maximum number of detections. SAHI extends the standard COCO evaluation with custom area range definitions (small, medium, large based on pixel area thresholds) and a higher default max detections limit of 500.
Key considerations:
- Default area thresholds are 1024, 9216, and 10^10 pixels (customizable)
- Standard COCO IoU thresholds range from 0.50 to 0.95 in steps of 0.05
- Max detections default is 500 (higher than standard COCO's 100) to accommodate dense scenes
- Both bbox and segmentation mask evaluation are supported
Step 3: Compute COCO Metrics
Run the COCOeval evaluation pipeline: match predictions to ground truth, accumulate statistics across images, and summarize results into AP and AR values. For classwise evaluation, iterate over each category independently and compute per-class AP and AR. Results are printed in a formatted table showing per-class and overall metrics.
Key considerations:
- Overall metrics include AP@[0.50:0.95], AP@0.50, AP@0.75, and area-specific breakdowns
- Classwise evaluation adds per-category rows to identify which object types perform best/worst
- Results can be exported to a JSON file in the output directory
- The formatted table uses AsciiTable for terminal-friendly display
Step 4: Error Analysis (Optional)
Perform detailed error categorization using the TIDE (A General Toolbox for Identifying Object Detection Errors) methodology. This step breaks down detection failures into six error types: classification errors, localization errors, duplicate detections, background false positives, missed detections, and similarity confusion. Generate per-class and summary error analysis plots.
Key considerations:
- Requires the tidecv package (optional dependency)
- Generates matplotlib plots saved as image files in the output directory
- Error categories help identify whether the model needs better localization, classification, or threshold tuning
- Both per-class and all-class summary plots are generated
- Area-specific analysis (small, medium, large objects) is included