Implementation:NVIDIA DALI EfficientDetPipeline

Knowledge Sources	NVIDIA DALI
Domains	Object_Detection, GPU_Computing
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete pipeline class for end-to-end EfficientDet data loading and preprocessing provided by the NVIDIA DALI EfficientDet example.

Description

EfficientDetPipeline is a self-contained class that constructs and manages a complete DALI pipeline for the EfficientDet object detection architecture. It reads images and annotations from either TFRecord or COCO format, applies configurable augmentations (GridMask, random horizontal flip, random crop-and-resize), normalizes pixel values, encodes ground-truth bounding boxes against pre-computed multi-scale anchors, and reshapes the encoded targets into per-level feature map outputs.

The constructor accepts a params dictionary (from the EfficientDet configuration), training arguments, and hardware placement options. Internally, it:

Resolves input files based on the chosen input type (TFRecord glob patterns or COCO directory paths).
Creates an Anchors object to pre-compute multi-scale anchor boxes normalized to [0, 1] in ltrb format.
Defines the DALI pipeline graph via the @pipeline_def decorator.
Exposes the pipeline as a tf.data.Dataset through the get_dataset() method using dali_tf.DALIDataset.

Usage

Instantiate EfficientDetPipeline with the model configuration and call get_dataset() to obtain a TensorFlow dataset. The returned dataset can be passed directly to model.fit() or iterated manually.

from pipeline.dali.efficientdet_pipeline import EfficientDetPipeline

pipeline = EfficientDetPipeline(
    params=params,
    batch_size=8,
    args=args,
    is_training=True,
    num_shards=1,
    device_id=0,
    cpu_only=False,
)
dataset = pipeline.get_dataset()

Code Reference

Source Location

Repository: NVIDIA DALI
File: docs/examples/use_cases/tensorflow/efficientdet/pipeline/dali/efficientdet_pipeline.py

Signature

class EfficientDetPipeline:
    def __init__(
        self,
        params,
        batch_size,
        args,
        is_training=True,
        num_shards=1,
        device_id=0,
        cpu_only=False,
    ):

Import

from pipeline.dali.efficientdet_pipeline import EfficientDetPipeline

I/O Contract

Inputs

Name	Type	Required	Description
params	dict	Yes	EfficientDet configuration dictionary containing image_size, grid_mask, max_instances_per_image, and seed.
batch_size	int	Yes	Number of samples per batch.
args	namedtuple	Yes	Training arguments including input_type, file patterns (train_file_pattern, eval_file_pattern), or COCO paths (images_path, annotations_path).
is_training	bool	No	Whether to apply training augmentations and random shuffling. Defaults to True.
num_shards	int	No	Total number of data-parallel shards for distributed training. Defaults to 1.
device_id	int	No	GPU device index for pipeline placement. Defaults to 0.
cpu_only	bool	No	If True, forces all operations to CPU. Defaults to False.

Outputs

Name	Type	Description
EfficientDetPipeline instance	EfficientDetPipeline	Object with get_dataset(), build(), and run() methods.
get_dataset() return	tf.data.Dataset	TensorFlow dataset yielding (images, num_positives, bboxes, classes, enc_layers)* per batch.

Usage Examples

Single-GPU Training Pipeline

import tensorflow as tf
from pipeline.dali.efficientdet_pipeline import EfficientDetPipeline

params = {
    "image_size": (512, 512),
    "grid_mask": True,
    "max_instances_per_image": 100,
    "seed": 42,
}

# args is a namedtuple with input_type, train_file_pattern, etc.
pipeline = EfficientDetPipeline(
    params=params,
    batch_size=16,
    args=args,
    is_training=True,
    num_shards=1,
    device_id=0,
)

train_dataset = pipeline.get_dataset()
model.fit(train_dataset, epochs=300, steps_per_epoch=2000)

Multi-GPU with MirroredStrategy

import tensorflow as tf
from pipeline.dali.efficientdet_pipeline import EfficientDetPipeline

def dali_dataset_fn(input_context):
    device_id = input_context.input_pipeline_id
    num_shards = input_context.num_input_pipelines
    with tf.device(f"/gpu:{device_id}"):
        return EfficientDetPipeline(
            params, batch_size // num_shards, args,
            is_training=True, num_shards=num_shards, device_id=device_id,
        ).get_dataset()

strategy = tf.distribute.MirroredStrategy()
dataset = strategy.distribute_datasets_from_function(dali_dataset_fn)

Related Pages

Implements Principle

Principle:NVIDIA_DALI_Detection_Pipeline_Definition

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment