Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA DALI SSD Utils

From Leeroopedia


Knowledge Sources
Domains Vision, Training
Last Updated 2026-02-08 16:00 GMT

Overview

Provides the complete set of utilities for Single Shot Detector (SSD) training including bounding box encoding/decoding, default box generation, data augmentation, and COCO dataset integration.

Description

This module implements the core utilities needed for SSD300 object detection training with PyTorch. The DefaultBoxes class generates the set of prior/anchor boxes at multiple feature map scales following the SSD architecture specification for 300x300 input (feature sizes [38, 19, 10, 5, 3, 1] with corresponding steps and scales). The Encoder class handles the bidirectional transformation between ground truth bounding boxes and the SSD network output format, including IoU-based matching with default boxes, coordinate encoding from ltrb to xywh format, and non-maximum suppression (NMS) during decoding.

The data augmentation pipeline includes SSDCropping which implements the original SSD paper's random cropping strategy with minimum IoU thresholds (0.1, 0.3, 0.5, 0.7, 0.9), RandomHorizontalFlip for random horizontal mirroring, and SSDTransformer which composes the full preprocessing pipeline (crop, flip, resize, color jitter, normalize) for both training and validation modes. The COCODetection class implements a PyTorch Dataset that loads COCO-format annotations, builds category label mappings, and serves image-bbox-label triplets with optional transform application.

The utility function calc_iou_tensor computes pairwise IoU between two sets of bounding boxes in a vectorized manner. The draw_patches function provides matplotlib-based visualization of bounding box predictions with labels. The factory function dboxes300_coco creates a standard set of default boxes configured for COCO detection at 300x300 resolution.

Usage

Use this module as the utility foundation for the SSD PyTorch training example with DALI data loading. The SSDTransformer handles CPU-side augmentation and encoding, while DALI handles GPU-accelerated image decoding and preprocessing. Use COCODetection as the dataset class and Encoder for post-processing network outputs during evaluation.

Code Reference

Source Location

Signature

def calc_iou_tensor(box1, box2): ...

class Encoder(object):
    def __init__(self, dboxes): ...
    def encode(self, bboxes_in, labels_in, criteria=0.5): ...
    def scale_back_batch(self, bboxes_in, scores_in): ...
    def decode_batch(self, bboxes_in, scores_in, criteria=0.45, max_output=200): ...
    def decode_single(self, bboxes_in, scores_in, criteria, max_output, max_num=200): ...

class DefaultBoxes(object):
    def __init__(self, fig_size, feat_size, steps, scales, aspect_ratios,
                 scale_xy=0.1, scale_wh=0.2): ...
    def __call__(self, order="ltrb"): ...

def dboxes300_coco(): ...

class SSDCropping(object):
    def __call__(self, img, img_size, bboxes, labels): ...

class RandomHorizontalFlip(object):
    def __call__(self, image, bboxes): ...

class SSDTransformer(object):
    def __init__(self, dboxes, args, size=(300, 300), val=False): ...
    def __call__(self, img, img_size, bbox=None, label=None, max_num=200): ...

class COCODetection(data.Dataset):
    def __init__(self, img_folder, annotate_file, transform=None): ...
    def __len__(self): ...
    def __getitem__(self, idx): ...

def draw_patches(img, bboxes, labels, order="xywh", label_map={}): ...

Import

from src.utils import (
    Encoder, DefaultBoxes, dboxes300_coco,
    SSDTransformer, COCODetection, calc_iou_tensor
)

I/O Contract

Inputs (Encoder.encode)

Name Type Required Description
bboxes_in torch.Tensor Yes Ground truth bounding boxes of shape [N, 4] in ltrb format.
labels_in torch.Tensor Yes Ground truth labels of shape [N].
criteria float No IoU threshold for matching. Default: 0.5.

Outputs (Encoder.encode)

Name Type Description
bboxes_out torch.Tensor Encoded bounding boxes of shape [8732, 4] in xywh format relative to default boxes.
labels_out torch.Tensor Encoded labels of shape [8732] (0 = background).

Inputs (Encoder.decode_batch)

Name Type Required Description
bboxes_in torch.Tensor Yes Network bounding box predictions of shape [N, 4, 8732].
scores_in torch.Tensor Yes Network class score predictions of shape [N, num_classes, 8732].
criteria float No NMS IoU threshold. Default: 0.45.
max_output int No Maximum number of output detections. Default: 200.

Outputs (Encoder.decode_batch)

Name Type Description
results list[tuple] List of (bboxes, labels, scores) tuples per image after NMS.

Inputs (COCODetection.__init__)

Name Type Required Description
img_folder str Yes Path to the folder containing COCO images.
annotate_file str Yes Path to the COCO-format JSON annotation file.
transform callable No Optional transform applied to (image, size, bboxes, labels). Default: None.

Outputs (COCODetection.__getitem__)

Name Type Description
img torch.Tensor or PIL.Image Transformed image.
img_id int COCO image ID.
img_size tuple(int, int) Image dimensions (height, width).
bbox_sizes torch.Tensor Bounding boxes in fractional coordinates [N, 4].
bbox_labels torch.Tensor Category labels [N].

Usage Examples

Setting up SSD training data

from src.utils import dboxes300_coco, SSDTransformer, COCODetection

# Create default boxes for SSD300
dboxes = dboxes300_coco()

# Create data augmentation transform
transform = SSDTransformer(dboxes, args, size=(300, 300), val=False)

# Create dataset
train_dataset = COCODetection(
    img_folder="/data/coco/train2017",
    annotate_file="/data/coco/annotations/instances_train2017.json",
    transform=transform
)

# Get a sample
img, img_id, img_size, bboxes, labels = train_dataset[0]
print(f"Image ID: {img_id}, Boxes: {bboxes.shape}, Labels: {labels.shape}")

Decoding SSD predictions

from src.utils import Encoder, dboxes300_coco

dboxes = dboxes300_coco()
encoder = Encoder(dboxes)

# Decode network output (after forward pass)
# ploc: [batch, 4, 8732], plabel: [batch, num_classes, 8732]
results = encoder.decode_batch(ploc, plabel, criteria=0.45, max_output=200)

for bboxes, labels, scores in results:
    print(f"Detected {len(labels)} objects")
    for bbox, label, score in zip(bboxes, labels, scores):
        print(f"  Class {label.item()}: score={score.item():.3f}, bbox={bbox.tolist()}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment