Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:NVIDIA DALI DALIClassificationIterator

From Leeroopedia


Knowledge Sources
Domains Data_Pipeline, Deep_Learning, Framework_Integration
Last Updated 2026-02-08 00:00 GMT

Overview

The DALIClassificationIterator class from NVIDIA DALI's PyTorch plugin that wraps a built DALI pipeline into a Python iterable yielding PyTorch tensors, with configurable last-batch handling and automatic epoch reset for seamless integration with PyTorch training loops.

Description

DALIClassificationIterator is the primary integration point between a DALI preprocessing pipeline and a PyTorch training loop for image classification tasks. It wraps a fully built DALI Pipeline object and implements Python's iterator protocol, yielding batches of data as PyTorch tensors.

Each iteration returns a list containing a single dictionary with two keys:

  • "data": A PyTorch tensor of shape [B, C, H, W] containing the preprocessed image batch
  • "label": A PyTorch tensor of shape [B, 1] containing the integer class labels

The iterator is configured with three key parameters:

reader_name: Links to the named reader operator (fn.readers.file with name="Reader") in the pipeline. This connection enables the iterator to query the reader for total dataset size (via the _size attribute) and to manage epoch boundaries. The dataset size is essential for computing the number of iterations per epoch.

last_batch_policy: Controls behavior when the remaining samples do not fill a complete batch:

  • LastBatchPolicy.PARTIAL: Returns a smaller final batch (preferred for validation to ensure all samples are evaluated)
  • LastBatchPolicy.DROP: Discards the incomplete final batch
  • LastBatchPolicy.FILL: Pads the final batch to full size by repeating samples

auto_reset: When True, the iterator automatically resets to the beginning of the dataset when exhausted, allowing it to be reused across training epochs without explicit reset calls.

Usage

Create the iterator after building the DALI pipeline. Use it in a standard Python for-loop. Extract data via data[0]["data"] and labels via data[0]["label"].squeeze(-1).long().

Code Reference

Source Location

  • Repository: NVIDIA DALI
  • File: docs/examples/use_cases/pytorch/resnet50/main.py (lines 326-331)
  • File: docs/examples/use_cases/pytorch/efficientnet/image_classification/dataloaders.py (lines 186-188)

Signature (ResNet50)

train_pipe.build()
train_loader = DALIClassificationIterator(
    train_pipe,
    reader_name="Reader",
    last_batch_policy=LastBatchPolicy.PARTIAL,
    auto_reset=True,
)

Signature (EfficientNet)

train_loader = DALIClassificationIterator(
    pipe, reader_name="Reader", fill_last_batch=False
)

Import

from nvidia.dali.plugin.pytorch import DALIClassificationIterator, LastBatchPolicy

I/O Contract

Inputs

Name Type Required Description
pipe nvidia.dali.Pipeline Yes A fully built DALI Pipeline object (after calling pipe.build())
reader_name str No Name of the reader operator in the pipeline (must match the name= parameter in fn.readers.file). Used for epoch size queries.
last_batch_policy LastBatchPolicy No How to handle the last batch: PARTIAL (return smaller batch), DROP (discard), or FILL (pad to full size). Default varies by version.
auto_reset bool No Automatically reset the iterator when the epoch is exhausted (default False)
fill_last_batch bool No Deprecated parameter; use last_batch_policy instead. False is equivalent to LastBatchPolicy.PARTIAL.

Outputs

Name Type Description
batch list[dict] A single-element list containing a dictionary with "data" key (PyTorch tensor [B, C, H, W] on GPU) and "label" key (PyTorch tensor [B, 1] on GPU)

Usage Examples

Standard Training Loop Integration

from nvidia.dali.plugin.pytorch import DALIClassificationIterator, LastBatchPolicy

# After pipeline is built:
train_loader = DALIClassificationIterator(
    train_pipe,
    reader_name="Reader",
    last_batch_policy=LastBatchPolicy.PARTIAL,
    auto_reset=True,
)

for i, data in enumerate(train_loader):
    images = data[0]["data"]          # [B, 3, 224, 224] float32 on GPU
    labels = data[0]["label"].squeeze(-1).long()  # [B] int64 on GPU
    output = model(images)
    loss = criterion(output, labels)
    # ... backward, optimizer step, etc.

Validation with PARTIAL Last-Batch Policy

val_loader = DALIClassificationIterator(
    val_pipe,
    reader_name="Reader",
    last_batch_policy=LastBatchPolicy.PARTIAL,
    auto_reset=True,
)

# All validation samples are evaluated, including partial final batch
for data in val_loader:
    images = data[0]["data"]
    labels = data[0]["label"].squeeze(-1).long()
    with torch.no_grad():
        output = model(images)

Querying Dataset Size

# The iterator exposes the dataset size through the _size attribute:
train_loader_len = int(math.ceil(train_loader._size / batch_size))
print(f"Training iterations per epoch: {train_loader_len}")

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment