Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Obss Sahi COCO Evaluation

From Leeroopedia


Knowledge Sources
Domains Computer_Vision, Model_Evaluation, Object_Detection
Last Updated 2026-02-08 12:00 GMT

Overview

End-to-end process for evaluating object detection predictions against ground truth annotations using COCO evaluation metrics, including classwise AP/AR computation and optional TIDE-based error analysis with exported plots.

Description

This workflow takes a COCO-format ground truth dataset and a COCO-format result JSON of predictions, then computes standard COCO evaluation metrics (Average Precision and Average Recall) across IoU thresholds and area sizes. It supports classwise metric breakdown, configurable maximum detections, and custom IoU thresholds. An optional error analysis step uses the TIDE framework to categorize detection errors (classification, localization, duplicate, background, missed) and exports diagnostic plots.

Usage

Execute this workflow after running inference (sliced or standard) with SAHI's prediction pipeline and exporting results in COCO JSON format. It is the standard way to quantitatively assess detection quality, compare model configurations, and diagnose error patterns. The input is a COCO dataset JSON (ground truth) and a COCO result JSON (predictions). The output is printed AP/AR metrics and optionally a directory of error analysis plots.

Execution Steps

Step 1: Load Ground Truth and Predictions

Load the COCO ground truth dataset JSON using pycocotools' COCO class. Load the prediction result JSON and associate it with the ground truth via COCOeval. Both files must share consistent image IDs and category IDs so predictions can be matched to their corresponding ground truth annotations.

Key considerations:

  • pycocotools must be installed (optional dependency of SAHI)
  • The prediction JSON must follow COCO result format: list of dicts with image_id, category_id, bbox, and score
  • Image IDs in the result must exist in the ground truth dataset

Step 2: Configure Evaluation Parameters

Set up the evaluation parameters including the metric type (bbox or segm), IoU thresholds, area ranges, and maximum number of detections. SAHI extends the standard COCO evaluation with custom area range definitions (small, medium, large based on pixel area thresholds) and a higher default max detections limit of 500.

Key considerations:

  • Default area thresholds are 1024, 9216, and 10^10 pixels (customizable)
  • Standard COCO IoU thresholds range from 0.50 to 0.95 in steps of 0.05
  • Max detections default is 500 (higher than standard COCO's 100) to accommodate dense scenes
  • Both bbox and segmentation mask evaluation are supported

Step 3: Compute COCO Metrics

Run the COCOeval evaluation pipeline: match predictions to ground truth, accumulate statistics across images, and summarize results into AP and AR values. For classwise evaluation, iterate over each category independently and compute per-class AP and AR. Results are printed in a formatted table showing per-class and overall metrics.

Key considerations:

  • Overall metrics include AP@[0.50:0.95], AP@0.50, AP@0.75, and area-specific breakdowns
  • Classwise evaluation adds per-category rows to identify which object types perform best/worst
  • Results can be exported to a JSON file in the output directory
  • The formatted table uses AsciiTable for terminal-friendly display

Step 4: Error Analysis (Optional)

Perform detailed error categorization using the TIDE (A General Toolbox for Identifying Object Detection Errors) methodology. This step breaks down detection failures into six error types: classification errors, localization errors, duplicate detections, background false positives, missed detections, and similarity confusion. Generate per-class and summary error analysis plots.

Key considerations:

  • Requires the tidecv package (optional dependency)
  • Generates matplotlib plots saved as image files in the output directory
  • Error categories help identify whether the model needs better localization, classification, or threshold tuning
  • Both per-class and all-class summary plots are generated
  • Area-specific analysis (small, medium, large objects) is included

Execution Diagram

GitHub URL

Workflow Repository