Implementation:NVIDIA DALI DALIDataset
| Knowledge Sources | |
|---|---|
| Domains | Object_Detection, GPU_Computing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete TensorFlow dataset adapter for wrapping a built DALI pipeline as a tf.data.Dataset, provided by nvidia.dali.plugin.tf.DALIDataset.
Description
DALIDataset is the official DALI-TensorFlow integration class that wraps a DALI pipeline into a tf.data.Dataset compatible object. In the EfficientDet pipeline, it is used by the get_dataset() method of EfficientDetPipeline to expose the full preprocessing pipeline as a TensorFlow dataset.
The get_dataset() method constructs the output shape and dtype specifications dynamically based on the model configuration:
- Fixed outputs: Images (batch_size, H, W, 3) as float32, num_positives (batch_size,) as float32, padded bboxes (batch_size, None, 4) as float32, padded classes (batch_size, None) as int32.
- Per-level outputs: For each feature pyramid level (3 through 7), an encoded class target of shape (batch_size, feat_h, feat_w, anchors_per_loc) as int32 and an encoded bbox target of shape (batch_size, feat_h, feat_w, anchors_per_loc * 4) as float32.
The DALIDataset call takes the built pipeline, batch size, and these shape/dtype tuples. The returned dataset can be consumed directly by model.fit() or distributed via tf.distribute.MirroredStrategy.
In the multi-GPU path (handled in utils.get_dataset), DALIDataset is created per-replica inside a distribute_datasets_from_function callback with InputOptions that place the dataset on the correct device and disable automatic fetching.
Usage
Call pipeline.get_dataset() to obtain the tf.data.Dataset. The pipeline must have been initialized with the correct parameters. No additional batching or prefetching is needed since DALI handles these internally.
Code Reference
Source Location
- Repository: NVIDIA DALI
- File: docs/examples/use_cases/tensorflow/efficientdet/pipeline/dali/efficientdet_pipeline.py (lines 209-245)
- File: docs/examples/use_cases/tensorflow/efficientdet/utils.py (lines 64-135)
Signature
# From nvidia.dali.plugin.tf:
dali_tf.DALIDataset(
pipeline=self._pipe,
batch_size=self._batch_size,
output_shapes=tuple(output_shapes),
output_dtypes=tuple(output_dtypes),
)
Import
import nvidia.dali.plugin.tf as dali_tf
dataset = dali_tf.DALIDataset(
pipeline=pipe,
batch_size=batch_size,
output_shapes=output_shapes,
output_dtypes=output_dtypes,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| pipeline | dali.Pipeline | Yes | A built DALI pipeline instance that produces the detection outputs. |
| batch_size | int | Yes | Number of samples per batch. Must match the batch size used to construct the pipeline. |
| output_shapes | tuple of tuples | Yes | Expected shapes for each pipeline output. The first element of each shape tuple is the batch dimension. Use None for variable-length dimensions. |
| output_dtypes | tuple of tf.DType | Yes | TensorFlow dtypes for each pipeline output (e.g., tf.float32, tf.int32). |
| device_id | int | No | GPU device index. Defaults to 0. |
Outputs
| Name | Type | Description |
|---|---|---|
| dataset | tf.data.Dataset | A TensorFlow dataset that yields tuples of tensors matching the specified shapes and dtypes. Each iteration produces one batch from the DALI pipeline. |
Usage Examples
Basic Single-GPU Usage
from pipeline.dali.efficientdet_pipeline import EfficientDetPipeline
pipeline = EfficientDetPipeline(params, batch_size=8, args=args)
dataset = pipeline.get_dataset()
# Use directly with Keras
model.fit(dataset, epochs=300, steps_per_epoch=2000)
Multi-GPU with MirroredStrategy
import tensorflow as tf
from pipeline.dali.efficientdet_pipeline import EfficientDetPipeline
strategy = tf.distribute.MirroredStrategy()
def dali_dataset_fn(input_context):
device_id = input_context.input_pipeline_id
num_shards = input_context.num_input_pipelines
with tf.device(f"/gpu:{device_id}"):
return EfficientDetPipeline(
params,
batch_size=total_batch_size // num_shards,
args=args,
is_training=True,
num_shards=num_shards,
device_id=device_id,
).get_dataset()
input_options = tf.distribute.InputOptions(
experimental_place_dataset_on_device=True,
experimental_fetch_to_device=False,
experimental_replication_mode=tf.distribute.InputReplicationMode.PER_REPLICA,
)
dataset = strategy.distribute_datasets_from_function(
dali_dataset_fn, input_options
)
Understanding Output Shapes
# Output shapes constructed in get_dataset():
output_shapes = [
(batch_size, image_h, image_w, 3), # images
(batch_size,), # num_positives
(batch_size, None, 4), # padded ground-truth bboxes
(batch_size, None), # padded ground-truth classes
]
# Per-level targets (levels 3-7):
for level in range(min_level, max_level + 1):
feat_size = anchors.feat_sizes[level]
output_shapes.append(
(batch_size, feat_size["height"], feat_size["width"],
anchors.get_anchors_per_location()) # class targets
)
output_shapes.append(
(batch_size, feat_size["height"], feat_size["width"],
anchors.get_anchors_per_location() * 4) # bbox targets
)