Implementation:NVIDIA DALI DALIGenericIterator
| Knowledge Sources | |
|---|---|
| Domains | Video_Processing, GPU_Computing, Framework_Integration |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete PyTorch-compatible iterator that wraps a built DALI pipeline and yields dictionaries of CUDA tensors, provided by the NVIDIA DALI PyTorch plugin.
Description
DALIGenericIterator is a class in the nvidia.dali.plugin.pytorch module that bridges DALI's internal pipeline execution with PyTorch's training loop conventions. It wraps a built DALI pipeline and presents it as a standard Python iterator, yielding one batch per iteration as a list containing a dictionary of PyTorch CUDA tensors.
The iterator is constructed with the following key configuration:
Pipeline binding: The first argument is the built DALI pipeline object whose outputs will be consumed.
Output mapping: The output_map parameter (second positional argument) is a list of string keys that names each pipeline output. For a pipeline with a single output, ["data"] creates a dictionary where each batch is accessible as batch[0]["data"], yielding a PyTorch CUDA tensor of shape [B, C, F, H, W] (batch dimension is prepended automatically).
Reader-name epoch tracking: The reader_name="Reader" parameter binds the iterator to the named reader operator in the pipeline, enabling automatic epoch size detection. The iterator knows when all samples have been yielded by querying the reader's internal counter, eliminating the need to manually pass the dataset size.
Last-batch policy: LastBatchPolicy.PARTIAL returns the remaining samples as a smaller-than-full batch at the end of each epoch. Alternative policies include FILL (pad to full batch) and DROP (discard the partial batch).
Auto-reset: When auto_reset=True, the iterator automatically resets the pipeline at epoch boundaries, allowing seamless use in nested for loops across multiple epochs.
Usage
Use DALIGenericIterator as the final component in a DALI-based data loading stack to feed preprocessed data into a PyTorch training loop. It replaces torch.utils.data.DataLoader when using DALI for data preprocessing.
Code Reference
Source Location
- Repository: NVIDIA DALI
- File: docs/examples/use_cases/video_superres/dataloading/dataloaders.py (lines 32-54)
- File: docs/examples/use_cases/video_superres/main.py (lines 141-197)
Signature
DALIGenericIterator(
pipeline,
["data"],
reader_name="Reader",
last_batch_policy=LastBatchPolicy.PARTIAL,
auto_reset=True
)
Import
from nvidia.dali.plugin.pytorch import DALIGenericIterator, LastBatchPolicy
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| pipeline | nvidia.dali.Pipeline | Yes | A built DALI pipeline whose outputs will be converted to PyTorch tensors |
| output_map | list of str | Yes | String keys naming each pipeline output; e.g., ["data"] for a single-output pipeline |
| reader_name | str | No | Name of the reader operator in the pipeline for epoch size detection |
| last_batch_policy | LastBatchPolicy | No | Policy for handling the final incomplete batch; PARTIAL, FILL, or DROP |
| auto_reset | bool | No | If True, automatically reset the pipeline at epoch boundaries for seamless multi-epoch iteration |
Outputs
| Name | Type | Description |
|---|---|---|
| batch | list of dict | Each iteration yields a list containing a dictionary. The dictionary maps output_map keys to PyTorch CUDA tensors. For the video pipeline: batch[0]["data"] has shape [B, C, F, H, W]. |
Usage Examples
DALILoader Wrapper Class
import os
from nvidia.dali.pipeline import pipeline_def
from nvidia.dali.plugin import pytorch
import nvidia.dali.fn as fn
import nvidia.dali.types as types
@pipeline_def
def create_video_reader_pipeline(sequence_length, files, crop_size):
images = fn.readers.video(
device="gpu", filenames=files,
sequence_length=sequence_length, normalized=False,
random_shuffle=True, image_type=types.RGB,
dtype=types.UINT8, initial_fill=16,
pad_last_batch=True, name="Reader"
)
images = fn.crop(
images, crop=crop_size, dtype=types.FLOAT,
crop_pos_x=fn.random.uniform(range=(0.0, 1.0)),
crop_pos_y=fn.random.uniform(range=(0.0, 1.0))
)
images = fn.transpose(images, perm=[3, 0, 1, 2])
return images
class DALILoader():
def __init__(self, batch_size, file_root, sequence_length, crop_size):
container_files = os.listdir(file_root)
container_files = [file_root + '/' + f for f in container_files]
self.pipeline = create_video_reader_pipeline(
batch_size=batch_size,
sequence_length=sequence_length,
num_threads=2,
device_id=0,
files=container_files,
crop_size=crop_size
)
self.pipeline.build()
self.epoch_size = self.pipeline.epoch_size("Reader")
self.dali_iterator = pytorch.DALIGenericIterator(
self.pipeline,
["data"],
reader_name="Reader",
last_batch_policy=pytorch.LastBatchPolicy.PARTIAL,
auto_reset=True
)
def __len__(self):
return int(self.epoch_size)
def __iter__(self):
return self.dali_iterator.__iter__()
Consuming Iterator in Training Loop
# From main.py training loop
loader = DALILoader(
batch_size=args.batchsize,
file_root=os.path.join(args.root, "train"),
sequence_length=args.frames,
crop_size=args.crop_size
)
for i, inputs in enumerate(loader):
# inputs is a list of dicts; extract the tensor
data = inputs[0]["data"] # shape: [B, C, F, H, W]
data = data.cuda(non_blocking=True)
optimizer.zero_grad()
loss = model(data, i, writer, im_out)
loss.backward()
optimizer.step()