Implementation:NVIDIA DALI DALIGenericIterator

Knowledge Sources	NVIDIA DALI
Domains	Video_Processing, GPU_Computing, Framework_Integration
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete PyTorch-compatible iterator that wraps a built DALI pipeline and yields dictionaries of CUDA tensors, provided by the NVIDIA DALI PyTorch plugin.

Description

DALIGenericIterator is a class in the nvidia.dali.plugin.pytorch module that bridges DALI's internal pipeline execution with PyTorch's training loop conventions. It wraps a built DALI pipeline and presents it as a standard Python iterator, yielding one batch per iteration as a list containing a dictionary of PyTorch CUDA tensors.

The iterator is constructed with the following key configuration:

Pipeline binding: The first argument is the built DALI pipeline object whose outputs will be consumed.

Output mapping: The output_map parameter (second positional argument) is a list of string keys that names each pipeline output. For a pipeline with a single output, ["data"] creates a dictionary where each batch is accessible as batch[0]["data"], yielding a PyTorch CUDA tensor of shape [B, C, F, H, W] (batch dimension is prepended automatically).

Reader-name epoch tracking: The reader_name="Reader" parameter binds the iterator to the named reader operator in the pipeline, enabling automatic epoch size detection. The iterator knows when all samples have been yielded by querying the reader's internal counter, eliminating the need to manually pass the dataset size.

Last-batch policy: LastBatchPolicy.PARTIAL returns the remaining samples as a smaller-than-full batch at the end of each epoch. Alternative policies include FILL (pad to full batch) and DROP (discard the partial batch).

Auto-reset: When auto_reset=True, the iterator automatically resets the pipeline at epoch boundaries, allowing seamless use in nested for loops across multiple epochs.

Usage

Use DALIGenericIterator as the final component in a DALI-based data loading stack to feed preprocessed data into a PyTorch training loop. It replaces torch.utils.data.DataLoader when using DALI for data preprocessing.

Code Reference

Source Location

Repository: NVIDIA DALI
File: docs/examples/use_cases/video_superres/dataloading/dataloaders.py (lines 32-54)
File: docs/examples/use_cases/video_superres/main.py (lines 141-197)

Signature

DALIGenericIterator(
    pipeline,
    ["data"],
    reader_name="Reader",
    last_batch_policy=LastBatchPolicy.PARTIAL,
    auto_reset=True
)

Import

from nvidia.dali.plugin.pytorch import DALIGenericIterator, LastBatchPolicy

I/O Contract

Inputs

Name	Type	Required	Description
pipeline	nvidia.dali.Pipeline	Yes	A built DALI pipeline whose outputs will be converted to PyTorch tensors
output_map	list of str	Yes	String keys naming each pipeline output; e.g., ["data"] for a single-output pipeline
reader_name	str	No	Name of the reader operator in the pipeline for epoch size detection
last_batch_policy	LastBatchPolicy	No	Policy for handling the final incomplete batch; PARTIAL, FILL, or DROP
auto_reset	bool	No	If True, automatically reset the pipeline at epoch boundaries for seamless multi-epoch iteration

Outputs

Name	Type	Description
batch	list of dict	Each iteration yields a list containing a dictionary. The dictionary maps output_map keys to PyTorch CUDA tensors. For the video pipeline: batch[0]["data"] has shape [B, C, F, H, W].

Usage Examples

DALILoader Wrapper Class

import os
from nvidia.dali.pipeline import pipeline_def
from nvidia.dali.plugin import pytorch
import nvidia.dali.fn as fn
import nvidia.dali.types as types

@pipeline_def
def create_video_reader_pipeline(sequence_length, files, crop_size):
    images = fn.readers.video(
        device="gpu", filenames=files,
        sequence_length=sequence_length, normalized=False,
        random_shuffle=True, image_type=types.RGB,
        dtype=types.UINT8, initial_fill=16,
        pad_last_batch=True, name="Reader"
    )
    images = fn.crop(
        images, crop=crop_size, dtype=types.FLOAT,
        crop_pos_x=fn.random.uniform(range=(0.0, 1.0)),
        crop_pos_y=fn.random.uniform(range=(0.0, 1.0))
    )
    images = fn.transpose(images, perm=[3, 0, 1, 2])
    return images

class DALILoader():
    def __init__(self, batch_size, file_root, sequence_length, crop_size):
        container_files = os.listdir(file_root)
        container_files = [file_root + '/' + f for f in container_files]
        self.pipeline = create_video_reader_pipeline(
            batch_size=batch_size,
            sequence_length=sequence_length,
            num_threads=2,
            device_id=0,
            files=container_files,
            crop_size=crop_size
        )
        self.pipeline.build()
        self.epoch_size = self.pipeline.epoch_size("Reader")
        self.dali_iterator = pytorch.DALIGenericIterator(
            self.pipeline,
            ["data"],
            reader_name="Reader",
            last_batch_policy=pytorch.LastBatchPolicy.PARTIAL,
            auto_reset=True
        )

    def __len__(self):
        return int(self.epoch_size)

    def __iter__(self):
        return self.dali_iterator.__iter__()

Consuming Iterator in Training Loop

# From main.py training loop
loader = DALILoader(
    batch_size=args.batchsize,
    file_root=os.path.join(args.root, "train"),
    sequence_length=args.frames,
    crop_size=args.crop_size
)

for i, inputs in enumerate(loader):
    # inputs is a list of dicts; extract the tensor
    data = inputs[0]["data"]       # shape: [B, C, F, H, W]
    data = data.cuda(non_blocking=True)

    optimizer.zero_grad()
    loss = model(data, i, writer, im_out)
    loss.backward()
    optimizer.step()

Related Pages

Implements Principle

Principle:NVIDIA_DALI_Generic_Iterator

Requires Environment

Uses Heuristic

Heuristic:NVIDIA_DALI_Last_Batch_Policy_Selection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment