Implementation:NVIDIA DALI SSD Utils

Knowledge Sources	NVIDIA_DALI
Domains	Vision, Training
Last Updated	2026-02-08 16:00 GMT

Overview

Provides the complete set of utilities for Single Shot Detector (SSD) training including bounding box encoding/decoding, default box generation, data augmentation, and COCO dataset integration.

Description

This module implements the core utilities needed for SSD300 object detection training with PyTorch. The DefaultBoxes class generates the set of prior/anchor boxes at multiple feature map scales following the SSD architecture specification for 300x300 input (feature sizes [38, 19, 10, 5, 3, 1] with corresponding steps and scales). The Encoder class handles the bidirectional transformation between ground truth bounding boxes and the SSD network output format, including IoU-based matching with default boxes, coordinate encoding from ltrb to xywh format, and non-maximum suppression (NMS) during decoding.

The data augmentation pipeline includes SSDCropping which implements the original SSD paper's random cropping strategy with minimum IoU thresholds (0.1, 0.3, 0.5, 0.7, 0.9), RandomHorizontalFlip for random horizontal mirroring, and SSDTransformer which composes the full preprocessing pipeline (crop, flip, resize, color jitter, normalize) for both training and validation modes. The COCODetection class implements a PyTorch Dataset that loads COCO-format annotations, builds category label mappings, and serves image-bbox-label triplets with optional transform application.

The utility function calc_iou_tensor computes pairwise IoU between two sets of bounding boxes in a vectorized manner. The draw_patches function provides matplotlib-based visualization of bounding box predictions with labels. The factory function dboxes300_coco creates a standard set of default boxes configured for COCO detection at 300x300 resolution.

Usage

Use this module as the utility foundation for the SSD PyTorch training example with DALI data loading. The SSDTransformer handles CPU-side augmentation and encoding, while DALI handles GPU-accelerated image decoding and preprocessing. Use COCODetection as the dataset class and Encoder for post-processing network outputs during evaluation.

Code Reference

Source Location

Repository: NVIDIA_DALI
File: docs/examples/use_cases/pytorch/single_stage_detector/src/utils.py
Lines: 1-593

Signature

def calc_iou_tensor(box1, box2): ...

class Encoder(object):
    def __init__(self, dboxes): ...
    def encode(self, bboxes_in, labels_in, criteria=0.5): ...
    def scale_back_batch(self, bboxes_in, scores_in): ...
    def decode_batch(self, bboxes_in, scores_in, criteria=0.45, max_output=200): ...
    def decode_single(self, bboxes_in, scores_in, criteria, max_output, max_num=200): ...

class DefaultBoxes(object):
    def __init__(self, fig_size, feat_size, steps, scales, aspect_ratios,
                 scale_xy=0.1, scale_wh=0.2): ...
    def __call__(self, order="ltrb"): ...

def dboxes300_coco(): ...

class SSDCropping(object):
    def __call__(self, img, img_size, bboxes, labels): ...

class RandomHorizontalFlip(object):
    def __call__(self, image, bboxes): ...

class SSDTransformer(object):
    def __init__(self, dboxes, args, size=(300, 300), val=False): ...
    def __call__(self, img, img_size, bbox=None, label=None, max_num=200): ...

class COCODetection(data.Dataset):
    def __init__(self, img_folder, annotate_file, transform=None): ...
    def __len__(self): ...
    def __getitem__(self, idx): ...

def draw_patches(img, bboxes, labels, order="xywh", label_map={}): ...

Import

from src.utils import (
    Encoder, DefaultBoxes, dboxes300_coco,
    SSDTransformer, COCODetection, calc_iou_tensor
)

I/O Contract

Inputs (Encoder.encode)

Name	Type	Required	Description
bboxes_in	torch.Tensor	Yes	Ground truth bounding boxes of shape [N, 4] in ltrb format.
labels_in	torch.Tensor	Yes	Ground truth labels of shape [N].
criteria	float	No	IoU threshold for matching. Default: 0.5.

Outputs (Encoder.encode)

Name	Type	Description
bboxes_out	torch.Tensor	Encoded bounding boxes of shape [8732, 4] in xywh format relative to default boxes.
labels_out	torch.Tensor	Encoded labels of shape [8732] (0 = background).

Inputs (Encoder.decode_batch)

Name	Type	Required	Description
bboxes_in	torch.Tensor	Yes	Network bounding box predictions of shape [N, 4, 8732].
scores_in	torch.Tensor	Yes	Network class score predictions of shape [N, num_classes, 8732].
criteria	float	No	NMS IoU threshold. Default: 0.45.
max_output	int	No	Maximum number of output detections. Default: 200.

Outputs (Encoder.decode_batch)

Name	Type	Description
results	list[tuple]	List of (bboxes, labels, scores) tuples per image after NMS.

Inputs (COCODetection.init)

Name	Type	Required	Description
img_folder	str	Yes	Path to the folder containing COCO images.
annotate_file	str	Yes	Path to the COCO-format JSON annotation file.
transform	callable	No	Optional transform applied to (image, size, bboxes, labels). Default: None.

Outputs (COCODetection.getitem)

Name	Type	Description
img	torch.Tensor or PIL.Image	Transformed image.
img_id	int	COCO image ID.
img_size	tuple(int, int)	Image dimensions (height, width).
bbox_sizes	torch.Tensor	Bounding boxes in fractional coordinates [N, 4].
bbox_labels	torch.Tensor	Category labels [N].

Usage Examples

Setting up SSD training data

from src.utils import dboxes300_coco, SSDTransformer, COCODetection

# Create default boxes for SSD300
dboxes = dboxes300_coco()

# Create data augmentation transform
transform = SSDTransformer(dboxes, args, size=(300, 300), val=False)

# Create dataset
train_dataset = COCODetection(
    img_folder="/data/coco/train2017",
    annotate_file="/data/coco/annotations/instances_train2017.json",
    transform=transform
)

# Get a sample
img, img_id, img_size, bboxes, labels = train_dataset[0]
print(f"Image ID: {img_id}, Boxes: {bboxes.shape}, Labels: {labels.shape}")

Decoding SSD predictions

from src.utils import Encoder, dboxes300_coco

dboxes = dboxes300_coco()
encoder = Encoder(dboxes)

# Decode network output (after forward pass)
# ploc: [batch, 4, 8732], plabel: [batch, num_classes, 8732]
results = encoder.decode_batch(ploc, plabel, criteria=0.45, max_output=200)

for bboxes, labels, scores in results:
    print(f"Detected {len(labels)} objects")
    for bbox, label, score in zip(bboxes, labels, scores):
        print(f"  Class {label.item()}: score={score.item():.3f}, bbox={bbox.tolist()}")

Related Pages

Environment:NVIDIA_DALI_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment