Implementation:Obss Sahi Slice Coco
| Knowledge Sources | |
|---|---|
| Domains | Object_Detection, Data_Engineering, Image_Processing |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
Concrete tool for slicing COCO-annotated image datasets into tiled sub-datasets with adjusted annotations provided by the SAHI library.
Description
slice_coco() processes an entire COCO dataset, slicing each image and its annotations into overlapping tiles. For each image:
- Loads the COCO annotation file and builds the Coco object
- Iterates over each CocoImage with a progress bar (tqdm)
- Calls slice_image() per image, passing its CocoAnnotation list for annotation slicing
- Collects all sliced CocoImage objects from SliceImageResult.coco_images
- Assembles the final COCO dict via create_coco_dict() and saves it via save_json()
Annotation slicing is handled by process_coco_annotations() which clips each annotation to the slice boundary and filters by min_area_ratio. Invalid geometries (TopologicalError from Shapely) are gracefully skipped with a warning.
Sliced images are exported to disk in parallel using ThreadPoolExecutor (within slice_image()).
Usage
Use this function to prepare training datasets for small object detection. Can be invoked programmatically or via the CLI command sahi coco slice.
Code Reference
Source Location
- Repository: sahi
- File: sahi/slicing.py
- Lines: L418-508
Signature
def slice_coco(
coco_annotation_file_path: str,
image_dir: str,
output_coco_annotation_file_name: str,
output_dir: str | None = None,
ignore_negative_samples: bool | None = False,
slice_height: int | None = 512,
slice_width: int | None = 512,
overlap_height_ratio: float | None = 0.2,
overlap_width_ratio: float | None = 0.2,
min_area_ratio: float | None = 0.1,
out_ext: str | None = None,
verbose: bool | None = False,
exif_fix: bool = True,
) -> list[dict | str]:
"""Slice COCO dataset images and annotations into tiles.
Args:
coco_annotation_file_path: Path to COCO annotation JSON
image_dir: Base directory containing images
output_coco_annotation_file_name: Output COCO JSON filename
output_dir: Output directory for sliced images and JSON
ignore_negative_samples: Skip images without annotations
slice_height: Tile height (default 512)
slice_width: Tile width (default 512)
overlap_height_ratio: Vertical overlap fraction (default 0.2)
overlap_width_ratio: Horizontal overlap fraction (default 0.2)
min_area_ratio: Min annotation area ratio to retain (default 0.1)
out_ext: Extension for saved images
verbose: Print progress info
exif_fix: Apply EXIF orientation fix
Returns:
Tuple of (coco_dict, save_path)
"""
Import
from sahi.slicing import slice_coco
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| coco_annotation_file_path | str | Yes | Path to COCO annotation JSON file |
| image_dir | str | Yes | Directory containing the dataset images |
| output_coco_annotation_file_name | str | Yes | Filename for the output COCO JSON |
| output_dir | str | No | Directory for sliced images and JSON output |
| slice_height | int | No | Tile height in pixels (default 512) |
| slice_width | int | No | Tile width in pixels (default 512) |
| overlap_height_ratio | float | No | Vertical overlap fraction (default 0.2) |
| overlap_width_ratio | float | No | Horizontal overlap fraction (default 0.2) |
| min_area_ratio | float | No | Min annotation area ratio to retain (default 0.1) |
| ignore_negative_samples | bool | No | Skip images without annotations (default False) |
Outputs
| Name | Type | Description |
|---|---|---|
| coco_dict | dict | COCO-format dict with sliced images and adjusted annotations |
| save_path | str | Path where the COCO JSON was saved |
Usage Examples
Basic Dataset Slicing
from sahi.slicing import slice_coco
coco_dict, save_path = slice_coco(
coco_annotation_file_path="train.json",
image_dir="images/train/",
output_coco_annotation_file_name="sliced_train",
output_dir="sliced_dataset/",
slice_height=640,
slice_width=640,
overlap_height_ratio=0.2,
overlap_width_ratio=0.2,
min_area_ratio=0.1,
)
print(f"Sliced dataset saved to: {save_path}")
print(f"Total sliced images: {len(coco_dict['images'])}")
print(f"Total annotations: {len(coco_dict['annotations'])}")
CLI Usage
sahi coco slice \
--image_dir images/train/ \
--dataset_json_path train.json \
--slice_size 512 \
--overlap_ratio 0.2 \
--output_dir sliced_dataset/