Implementation:NVIDIA DALI DALIDataset

Knowledge Sources	NVIDIA DALI
Domains	Object_Detection, GPU_Computing
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete TensorFlow dataset adapter for wrapping a built DALI pipeline as a tf.data.Dataset, provided by nvidia.dali.plugin.tf.DALIDataset.

Description

DALIDataset is the official DALI-TensorFlow integration class that wraps a DALI pipeline into a tf.data.Dataset compatible object. In the EfficientDet pipeline, it is used by the get_dataset() method of EfficientDetPipeline to expose the full preprocessing pipeline as a TensorFlow dataset.

The get_dataset() method constructs the output shape and dtype specifications dynamically based on the model configuration:

Fixed outputs: Images (batch_size, H, W, 3) as float32, num_positives (batch_size,) as float32, padded bboxes (batch_size, None, 4) as float32, padded classes (batch_size, None) as int32.
Per-level outputs: For each feature pyramid level (3 through 7), an encoded class target of shape (batch_size, feat_h, feat_w, anchors_per_loc) as int32 and an encoded bbox target of shape (batch_size, feat_h, feat_w, anchors_per_loc * 4) as float32.

The DALIDataset call takes the built pipeline, batch size, and these shape/dtype tuples. The returned dataset can be consumed directly by model.fit() or distributed via tf.distribute.MirroredStrategy.

In the multi-GPU path (handled in utils.get_dataset), DALIDataset is created per-replica inside a distribute_datasets_from_function callback with InputOptions that place the dataset on the correct device and disable automatic fetching.

Usage

Call pipeline.get_dataset() to obtain the tf.data.Dataset. The pipeline must have been initialized with the correct parameters. No additional batching or prefetching is needed since DALI handles these internally.

Code Reference

Source Location

Repository: NVIDIA DALI
File: docs/examples/use_cases/tensorflow/efficientdet/pipeline/dali/efficientdet_pipeline.py (lines 209-245)
File: docs/examples/use_cases/tensorflow/efficientdet/utils.py (lines 64-135)

Signature

# From nvidia.dali.plugin.tf:
dali_tf.DALIDataset(
    pipeline=self._pipe,
    batch_size=self._batch_size,
    output_shapes=tuple(output_shapes),
    output_dtypes=tuple(output_dtypes),
)

Import

import nvidia.dali.plugin.tf as dali_tf

dataset = dali_tf.DALIDataset(
    pipeline=pipe,
    batch_size=batch_size,
    output_shapes=output_shapes,
    output_dtypes=output_dtypes,
)

I/O Contract

Inputs

Name	Type	Required	Description
pipeline	dali.Pipeline	Yes	A built DALI pipeline instance that produces the detection outputs.
batch_size	int	Yes	Number of samples per batch. Must match the batch size used to construct the pipeline.
output_shapes	tuple of tuples	Yes	Expected shapes for each pipeline output. The first element of each shape tuple is the batch dimension. Use None for variable-length dimensions.
output_dtypes	tuple of tf.DType	Yes	TensorFlow dtypes for each pipeline output (e.g., tf.float32, tf.int32).
device_id	int	No	GPU device index. Defaults to 0.

Outputs

Name	Type	Description
dataset	tf.data.Dataset	A TensorFlow dataset that yields tuples of tensors matching the specified shapes and dtypes. Each iteration produces one batch from the DALI pipeline.

Usage Examples

Basic Single-GPU Usage

from pipeline.dali.efficientdet_pipeline import EfficientDetPipeline

pipeline = EfficientDetPipeline(params, batch_size=8, args=args)
dataset = pipeline.get_dataset()

# Use directly with Keras
model.fit(dataset, epochs=300, steps_per_epoch=2000)

Multi-GPU with MirroredStrategy

import tensorflow as tf
from pipeline.dali.efficientdet_pipeline import EfficientDetPipeline

strategy = tf.distribute.MirroredStrategy()

def dali_dataset_fn(input_context):
    device_id = input_context.input_pipeline_id
    num_shards = input_context.num_input_pipelines
    with tf.device(f"/gpu:{device_id}"):
        return EfficientDetPipeline(
            params,
            batch_size=total_batch_size // num_shards,
            args=args,
            is_training=True,
            num_shards=num_shards,
            device_id=device_id,
        ).get_dataset()

input_options = tf.distribute.InputOptions(
    experimental_place_dataset_on_device=True,
    experimental_fetch_to_device=False,
    experimental_replication_mode=tf.distribute.InputReplicationMode.PER_REPLICA,
)

dataset = strategy.distribute_datasets_from_function(
    dali_dataset_fn, input_options
)

Understanding Output Shapes

# Output shapes constructed in get_dataset():
output_shapes = [
    (batch_size, image_h, image_w, 3),   # images
    (batch_size,),                         # num_positives
    (batch_size, None, 4),                 # padded ground-truth bboxes
    (batch_size, None),                    # padded ground-truth classes
]

# Per-level targets (levels 3-7):
for level in range(min_level, max_level + 1):
    feat_size = anchors.feat_sizes[level]
    output_shapes.append(
        (batch_size, feat_size["height"], feat_size["width"],
         anchors.get_anchors_per_location())        # class targets
    )
    output_shapes.append(
        (batch_size, feat_size["height"], feat_size["width"],
         anchors.get_anchors_per_location() * 4)    # bbox targets
    )

Related Pages

Implements Principle

Principle:NVIDIA_DALI_TensorFlow_Dataset_Integration

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment