Overview
Provides the complete set of utilities for Single Shot Detector (SSD) training including bounding box encoding/decoding, default box generation, data augmentation, and COCO dataset integration.
Description
This module implements the core utilities needed for SSD300 object detection training with PyTorch. The DefaultBoxes class generates the set of prior/anchor boxes at multiple feature map scales following the SSD architecture specification for 300x300 input (feature sizes [38, 19, 10, 5, 3, 1] with corresponding steps and scales). The Encoder class handles the bidirectional transformation between ground truth bounding boxes and the SSD network output format, including IoU-based matching with default boxes, coordinate encoding from ltrb to xywh format, and non-maximum suppression (NMS) during decoding.
The data augmentation pipeline includes SSDCropping which implements the original SSD paper's random cropping strategy with minimum IoU thresholds (0.1, 0.3, 0.5, 0.7, 0.9), RandomHorizontalFlip for random horizontal mirroring, and SSDTransformer which composes the full preprocessing pipeline (crop, flip, resize, color jitter, normalize) for both training and validation modes. The COCODetection class implements a PyTorch Dataset that loads COCO-format annotations, builds category label mappings, and serves image-bbox-label triplets with optional transform application.
The utility function calc_iou_tensor computes pairwise IoU between two sets of bounding boxes in a vectorized manner. The draw_patches function provides matplotlib-based visualization of bounding box predictions with labels. The factory function dboxes300_coco creates a standard set of default boxes configured for COCO detection at 300x300 resolution.
Usage
Use this module as the utility foundation for the SSD PyTorch training example with DALI data loading. The SSDTransformer handles CPU-side augmentation and encoding, while DALI handles GPU-accelerated image decoding and preprocessing. Use COCODetection as the dataset class and Encoder for post-processing network outputs during evaluation.
Code Reference
Source Location
Signature
def calc_iou_tensor(box1, box2): ...
class Encoder(object):
def __init__(self, dboxes): ...
def encode(self, bboxes_in, labels_in, criteria=0.5): ...
def scale_back_batch(self, bboxes_in, scores_in): ...
def decode_batch(self, bboxes_in, scores_in, criteria=0.45, max_output=200): ...
def decode_single(self, bboxes_in, scores_in, criteria, max_output, max_num=200): ...
class DefaultBoxes(object):
def __init__(self, fig_size, feat_size, steps, scales, aspect_ratios,
scale_xy=0.1, scale_wh=0.2): ...
def __call__(self, order="ltrb"): ...
def dboxes300_coco(): ...
class SSDCropping(object):
def __call__(self, img, img_size, bboxes, labels): ...
class RandomHorizontalFlip(object):
def __call__(self, image, bboxes): ...
class SSDTransformer(object):
def __init__(self, dboxes, args, size=(300, 300), val=False): ...
def __call__(self, img, img_size, bbox=None, label=None, max_num=200): ...
class COCODetection(data.Dataset):
def __init__(self, img_folder, annotate_file, transform=None): ...
def __len__(self): ...
def __getitem__(self, idx): ...
def draw_patches(img, bboxes, labels, order="xywh", label_map={}): ...
Import
from src.utils import (
Encoder, DefaultBoxes, dboxes300_coco,
SSDTransformer, COCODetection, calc_iou_tensor
)
I/O Contract
Inputs (Encoder.encode)
| Name |
Type |
Required |
Description
|
| bboxes_in |
torch.Tensor |
Yes |
Ground truth bounding boxes of shape [N, 4] in ltrb format.
|
| labels_in |
torch.Tensor |
Yes |
Ground truth labels of shape [N].
|
| criteria |
float |
No |
IoU threshold for matching. Default: 0.5.
|
Outputs (Encoder.encode)
| Name |
Type |
Description
|
| bboxes_out |
torch.Tensor |
Encoded bounding boxes of shape [8732, 4] in xywh format relative to default boxes.
|
| labels_out |
torch.Tensor |
Encoded labels of shape [8732] (0 = background).
|
Inputs (Encoder.decode_batch)
| Name |
Type |
Required |
Description
|
| bboxes_in |
torch.Tensor |
Yes |
Network bounding box predictions of shape [N, 4, 8732].
|
| scores_in |
torch.Tensor |
Yes |
Network class score predictions of shape [N, num_classes, 8732].
|
| criteria |
float |
No |
NMS IoU threshold. Default: 0.45.
|
| max_output |
int |
No |
Maximum number of output detections. Default: 200.
|
Outputs (Encoder.decode_batch)
| Name |
Type |
Description
|
| results |
list[tuple] |
List of (bboxes, labels, scores) tuples per image after NMS.
|
Inputs (COCODetection.__init__)
| Name |
Type |
Required |
Description
|
| img_folder |
str |
Yes |
Path to the folder containing COCO images.
|
| annotate_file |
str |
Yes |
Path to the COCO-format JSON annotation file.
|
| transform |
callable |
No |
Optional transform applied to (image, size, bboxes, labels). Default: None.
|
Outputs (COCODetection.__getitem__)
| Name |
Type |
Description
|
| img |
torch.Tensor or PIL.Image |
Transformed image.
|
| img_id |
int |
COCO image ID.
|
| img_size |
tuple(int, int) |
Image dimensions (height, width).
|
| bbox_sizes |
torch.Tensor |
Bounding boxes in fractional coordinates [N, 4].
|
| bbox_labels |
torch.Tensor |
Category labels [N].
|
Usage Examples
Setting up SSD training data
from src.utils import dboxes300_coco, SSDTransformer, COCODetection
# Create default boxes for SSD300
dboxes = dboxes300_coco()
# Create data augmentation transform
transform = SSDTransformer(dboxes, args, size=(300, 300), val=False)
# Create dataset
train_dataset = COCODetection(
img_folder="/data/coco/train2017",
annotate_file="/data/coco/annotations/instances_train2017.json",
transform=transform
)
# Get a sample
img, img_id, img_size, bboxes, labels = train_dataset[0]
print(f"Image ID: {img_id}, Boxes: {bboxes.shape}, Labels: {labels.shape}")
Decoding SSD predictions
from src.utils import Encoder, dboxes300_coco
dboxes = dboxes300_coco()
encoder = Encoder(dboxes)
# Decode network output (after forward pass)
# ploc: [batch, 4, 8732], plabel: [batch, num_classes, 8732]
results = encoder.decode_batch(ploc, plabel, criteria=0.45, max_output=200)
for bboxes, labels, scores in results:
print(f"Detected {len(labels)} objects")
for bbox, label, score in zip(bboxes, labels, scores):
print(f" Class {label.item()}: score={score.item():.3f}, bbox={bbox.tolist()}")
Related Pages