Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:NVIDIA DALI DALIDataset

From Leeroopedia


Knowledge Sources
Domains Object_Detection, GPU_Computing
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete TensorFlow dataset adapter for wrapping a built DALI pipeline as a tf.data.Dataset, provided by nvidia.dali.plugin.tf.DALIDataset.

Description

DALIDataset is the official DALI-TensorFlow integration class that wraps a DALI pipeline into a tf.data.Dataset compatible object. In the EfficientDet pipeline, it is used by the get_dataset() method of EfficientDetPipeline to expose the full preprocessing pipeline as a TensorFlow dataset.

The get_dataset() method constructs the output shape and dtype specifications dynamically based on the model configuration:

  1. Fixed outputs: Images (batch_size, H, W, 3) as float32, num_positives (batch_size,) as float32, padded bboxes (batch_size, None, 4) as float32, padded classes (batch_size, None) as int32.
  2. Per-level outputs: For each feature pyramid level (3 through 7), an encoded class target of shape (batch_size, feat_h, feat_w, anchors_per_loc) as int32 and an encoded bbox target of shape (batch_size, feat_h, feat_w, anchors_per_loc * 4) as float32.

The DALIDataset call takes the built pipeline, batch size, and these shape/dtype tuples. The returned dataset can be consumed directly by model.fit() or distributed via tf.distribute.MirroredStrategy.

In the multi-GPU path (handled in utils.get_dataset), DALIDataset is created per-replica inside a distribute_datasets_from_function callback with InputOptions that place the dataset on the correct device and disable automatic fetching.

Usage

Call pipeline.get_dataset() to obtain the tf.data.Dataset. The pipeline must have been initialized with the correct parameters. No additional batching or prefetching is needed since DALI handles these internally.

Code Reference

Source Location

  • Repository: NVIDIA DALI
  • File: docs/examples/use_cases/tensorflow/efficientdet/pipeline/dali/efficientdet_pipeline.py (lines 209-245)
  • File: docs/examples/use_cases/tensorflow/efficientdet/utils.py (lines 64-135)

Signature

# From nvidia.dali.plugin.tf:
dali_tf.DALIDataset(
    pipeline=self._pipe,
    batch_size=self._batch_size,
    output_shapes=tuple(output_shapes),
    output_dtypes=tuple(output_dtypes),
)

Import

import nvidia.dali.plugin.tf as dali_tf

dataset = dali_tf.DALIDataset(
    pipeline=pipe,
    batch_size=batch_size,
    output_shapes=output_shapes,
    output_dtypes=output_dtypes,
)

I/O Contract

Inputs

Name Type Required Description
pipeline dali.Pipeline Yes A built DALI pipeline instance that produces the detection outputs.
batch_size int Yes Number of samples per batch. Must match the batch size used to construct the pipeline.
output_shapes tuple of tuples Yes Expected shapes for each pipeline output. The first element of each shape tuple is the batch dimension. Use None for variable-length dimensions.
output_dtypes tuple of tf.DType Yes TensorFlow dtypes for each pipeline output (e.g., tf.float32, tf.int32).
device_id int No GPU device index. Defaults to 0.

Outputs

Name Type Description
dataset tf.data.Dataset A TensorFlow dataset that yields tuples of tensors matching the specified shapes and dtypes. Each iteration produces one batch from the DALI pipeline.

Usage Examples

Basic Single-GPU Usage

from pipeline.dali.efficientdet_pipeline import EfficientDetPipeline

pipeline = EfficientDetPipeline(params, batch_size=8, args=args)
dataset = pipeline.get_dataset()

# Use directly with Keras
model.fit(dataset, epochs=300, steps_per_epoch=2000)

Multi-GPU with MirroredStrategy

import tensorflow as tf
from pipeline.dali.efficientdet_pipeline import EfficientDetPipeline

strategy = tf.distribute.MirroredStrategy()

def dali_dataset_fn(input_context):
    device_id = input_context.input_pipeline_id
    num_shards = input_context.num_input_pipelines
    with tf.device(f"/gpu:{device_id}"):
        return EfficientDetPipeline(
            params,
            batch_size=total_batch_size // num_shards,
            args=args,
            is_training=True,
            num_shards=num_shards,
            device_id=device_id,
        ).get_dataset()

input_options = tf.distribute.InputOptions(
    experimental_place_dataset_on_device=True,
    experimental_fetch_to_device=False,
    experimental_replication_mode=tf.distribute.InputReplicationMode.PER_REPLICA,
)

dataset = strategy.distribute_datasets_from_function(
    dali_dataset_fn, input_options
)

Understanding Output Shapes

# Output shapes constructed in get_dataset():
output_shapes = [
    (batch_size, image_h, image_w, 3),   # images
    (batch_size,),                         # num_positives
    (batch_size, None, 4),                 # padded ground-truth bboxes
    (batch_size, None),                    # padded ground-truth classes
]

# Per-level targets (levels 3-7):
for level in range(min_level, max_level + 1):
    feat_size = anchors.feat_sizes[level]
    output_shapes.append(
        (batch_size, feat_size["height"], feat_size["width"],
         anchors.get_anchors_per_location())        # class targets
    )
    output_shapes.append(
        (batch_size, feat_size["height"], feat_size["width"],
         anchors.get_anchors_per_location() * 4)    # bbox targets
    )

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment