Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Obss Sahi Sliced Inference Pipeline

From Leeroopedia


Knowledge Sources
Domains Computer_Vision, Object_Detection, Inference
Last Updated 2026-02-08 12:00 GMT

Overview

End-to-end process for performing sliced inference on high-resolution images to detect small objects using any supported detection framework (Ultralytics, HuggingFace, MMDetection, TorchVision, Detectron2, Roboflow, YOLOv5).

Description

This workflow implements SAHI's core capability: Slicing Aided Hyper Inference. Large images are divided into overlapping tiles, object detection is performed on each tile independently, and the resulting predictions are remapped to original image coordinates and merged using postprocessing algorithms (NMS, NMM, or GreedyNMM). An optional full-image prediction is combined with slice predictions to maintain detection accuracy for both small and large objects. The output is a unified set of deduplicated object predictions covering the entire image.

Usage

Execute this workflow when you need to detect objects (especially small ones) in high-resolution images where standard single-pass inference misses detections due to downscaling. Typical scenarios include satellite imagery, drone footage, surveillance video, and any domain where objects of interest are small relative to the total image area. The input is one or more images (or video frames) and a pre-trained detection model. The output is a set of bounding box (and optionally mask) predictions with confidence scores and class labels.

Execution Steps

Step 1: Model Initialization

Select and load a detection model using the auto-detection factory. The factory maps a model type string (e.g., "ultralytics", "huggingface", "mmdet") to the appropriate model wrapper class. The model weights are loaded from a file path or a pre-initialized model object. Configuration includes setting the confidence threshold, target device (CPU/GPU), and optional category mappings.

Key considerations:

  • Choose the model type matching your detection framework installation
  • Set confidence threshold appropriately (values below 0.1 trigger automatic postprocess switching to NMS)
  • Specify device as "cuda" for GPU acceleration or "cpu" for CPU-only environments
  • Category mapping allows remapping model output class IDs to custom names

Step 2: Image Slicing

Divide the input image into overlapping rectangular tiles. The slicing parameters include tile dimensions (width and height) and overlap ratios. If slice dimensions are not specified, automatic resolution-based calculation determines optimal tile sizes based on the image's pixel count and aspect ratio. Each tile records its starting pixel coordinates for later coordinate remapping.

Key considerations:

  • Default overlap ratio is 0.2 (20%) in both dimensions to ensure objects at tile boundaries are captured
  • Auto-slice resolution adapts tile sizes based on four resolution tiers: low, medium, high, and ultra-high
  • Tiles at image edges are clamped to avoid exceeding image bounds
  • The number of resulting tiles determines inference time linearly

Step 3: Per-Slice Detection

Run the detection model on each image tile sequentially. For each tile, the image array is passed to the model's inference method, which produces raw predictions. These raw predictions are then converted to standardized ObjectPrediction objects with bounding boxes, confidence scores, and class labels. Each prediction's coordinates are shifted from tile-local to full-image coordinates using the tile's starting pixel offset.

Key considerations:

  • Currently supports batch size of 1 (one tile at a time)
  • Coordinate remapping uses shift_amount to translate tile-local boxes to full-image coordinates
  • Optional merge buffer allows periodic intermediate merging to reduce memory usage for very large images
  • A progress callback can report slice processing progress

Step 4: Full-Image Detection (Optional)

Perform a standard (non-sliced) detection pass on the entire original image. This complementary prediction catches large objects that may be split across multiple tiles in the sliced approach. The full-image predictions are appended to the collection of slice predictions before the final merge step.

Key considerations:

  • Enabled by default (perform_standard_pred=True)
  • Skipped if only a single slice was generated (image is already small enough)
  • Can be disabled for pure small-object detection scenarios where large objects are not expected
  • Uses the same model instance as the sliced predictions

Step 5: Prediction Merging (Postprocessing)

Merge and deduplicate overlapping predictions from all tiles and the optional full-image pass. The postprocessing algorithm eliminates redundant detections caused by objects appearing in multiple overlapping tiles. Three algorithm options are available: GreedyNMM (default), NMS, and NMM. The match metric (IoU or IoS) and threshold control how aggressively duplicates are suppressed.

Key considerations:

  • GreedyNMM is the default and generally produces the best results for sliced inference
  • IoS (Intersection over Smaller) is the default metric, better suited for sliced scenarios than IoU
  • Class-agnostic mode ignores category IDs during matching (useful when classes overlap visually)
  • Low confidence thresholds automatically switch postprocessing to NMS with IoU metric
  • OBB (Oriented Bounding Box) models force NMS postprocessing

Step 6: Result Export

Package the merged predictions into a PredictionResult object and optionally export visualizations, COCO-format JSON, cropped detections, or pickle files. Visualizations overlay bounding boxes and labels onto the original image. When a COCO dataset JSON is provided, predictions are formatted as COCO result annotations with image IDs for evaluation.

Key considerations:

  • Visual export supports PNG (default) and JPG formats
  • COCO JSON export enables downstream evaluation with standard metrics
  • Crop export saves individual detected objects as separate image files
  • Video input is supported with frame-by-frame processing and video output writing
  • Ground truth overlays (green boxes) can be combined with prediction overlays (red boxes) when dataset JSON is provided

Execution Diagram

GitHub URL

Workflow Repository