Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:NVIDIA DALI DALIGenericIterator

From Leeroopedia


Knowledge Sources
Domains Video_Processing, GPU_Computing, Framework_Integration
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete PyTorch-compatible iterator that wraps a built DALI pipeline and yields dictionaries of CUDA tensors, provided by the NVIDIA DALI PyTorch plugin.

Description

DALIGenericIterator is a class in the nvidia.dali.plugin.pytorch module that bridges DALI's internal pipeline execution with PyTorch's training loop conventions. It wraps a built DALI pipeline and presents it as a standard Python iterator, yielding one batch per iteration as a list containing a dictionary of PyTorch CUDA tensors.

The iterator is constructed with the following key configuration:

Pipeline binding: The first argument is the built DALI pipeline object whose outputs will be consumed.

Output mapping: The output_map parameter (second positional argument) is a list of string keys that names each pipeline output. For a pipeline with a single output, ["data"] creates a dictionary where each batch is accessible as batch[0]["data"], yielding a PyTorch CUDA tensor of shape [B, C, F, H, W] (batch dimension is prepended automatically).

Reader-name epoch tracking: The reader_name="Reader" parameter binds the iterator to the named reader operator in the pipeline, enabling automatic epoch size detection. The iterator knows when all samples have been yielded by querying the reader's internal counter, eliminating the need to manually pass the dataset size.

Last-batch policy: LastBatchPolicy.PARTIAL returns the remaining samples as a smaller-than-full batch at the end of each epoch. Alternative policies include FILL (pad to full batch) and DROP (discard the partial batch).

Auto-reset: When auto_reset=True, the iterator automatically resets the pipeline at epoch boundaries, allowing seamless use in nested for loops across multiple epochs.

Usage

Use DALIGenericIterator as the final component in a DALI-based data loading stack to feed preprocessed data into a PyTorch training loop. It replaces torch.utils.data.DataLoader when using DALI for data preprocessing.

Code Reference

Source Location

  • Repository: NVIDIA DALI
  • File: docs/examples/use_cases/video_superres/dataloading/dataloaders.py (lines 32-54)
  • File: docs/examples/use_cases/video_superres/main.py (lines 141-197)

Signature

DALIGenericIterator(
    pipeline,
    ["data"],
    reader_name="Reader",
    last_batch_policy=LastBatchPolicy.PARTIAL,
    auto_reset=True
)

Import

from nvidia.dali.plugin.pytorch import DALIGenericIterator, LastBatchPolicy

I/O Contract

Inputs

Name Type Required Description
pipeline nvidia.dali.Pipeline Yes A built DALI pipeline whose outputs will be converted to PyTorch tensors
output_map list of str Yes String keys naming each pipeline output; e.g., ["data"] for a single-output pipeline
reader_name str No Name of the reader operator in the pipeline for epoch size detection
last_batch_policy LastBatchPolicy No Policy for handling the final incomplete batch; PARTIAL, FILL, or DROP
auto_reset bool No If True, automatically reset the pipeline at epoch boundaries for seamless multi-epoch iteration

Outputs

Name Type Description
batch list of dict Each iteration yields a list containing a dictionary. The dictionary maps output_map keys to PyTorch CUDA tensors. For the video pipeline: batch[0]["data"] has shape [B, C, F, H, W].

Usage Examples

DALILoader Wrapper Class

import os
from nvidia.dali.pipeline import pipeline_def
from nvidia.dali.plugin import pytorch
import nvidia.dali.fn as fn
import nvidia.dali.types as types

@pipeline_def
def create_video_reader_pipeline(sequence_length, files, crop_size):
    images = fn.readers.video(
        device="gpu", filenames=files,
        sequence_length=sequence_length, normalized=False,
        random_shuffle=True, image_type=types.RGB,
        dtype=types.UINT8, initial_fill=16,
        pad_last_batch=True, name="Reader"
    )
    images = fn.crop(
        images, crop=crop_size, dtype=types.FLOAT,
        crop_pos_x=fn.random.uniform(range=(0.0, 1.0)),
        crop_pos_y=fn.random.uniform(range=(0.0, 1.0))
    )
    images = fn.transpose(images, perm=[3, 0, 1, 2])
    return images

class DALILoader():
    def __init__(self, batch_size, file_root, sequence_length, crop_size):
        container_files = os.listdir(file_root)
        container_files = [file_root + '/' + f for f in container_files]
        self.pipeline = create_video_reader_pipeline(
            batch_size=batch_size,
            sequence_length=sequence_length,
            num_threads=2,
            device_id=0,
            files=container_files,
            crop_size=crop_size
        )
        self.pipeline.build()
        self.epoch_size = self.pipeline.epoch_size("Reader")
        self.dali_iterator = pytorch.DALIGenericIterator(
            self.pipeline,
            ["data"],
            reader_name="Reader",
            last_batch_policy=pytorch.LastBatchPolicy.PARTIAL,
            auto_reset=True
        )

    def __len__(self):
        return int(self.epoch_size)

    def __iter__(self):
        return self.dali_iterator.__iter__()

Consuming Iterator in Training Loop

# From main.py training loop
loader = DALILoader(
    batch_size=args.batchsize,
    file_root=os.path.join(args.root, "train"),
    sequence_length=args.frames,
    crop_size=args.crop_size
)

for i, inputs in enumerate(loader):
    # inputs is a list of dicts; extract the tensor
    data = inputs[0]["data"]       # shape: [B, C, F, H, W]
    data = data.cuda(non_blocking=True)

    optimizer.zero_grad()
    loss = model(data, i, writer, im_out)
    loss.backward()
    optimizer.step()

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment