Implementation:NVIDIA DALI DALIClassificationIterator
| Knowledge Sources | |
|---|---|
| Domains | Data_Pipeline, Deep_Learning, Framework_Integration |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
The DALIClassificationIterator class from NVIDIA DALI's PyTorch plugin that wraps a built DALI pipeline into a Python iterable yielding PyTorch tensors, with configurable last-batch handling and automatic epoch reset for seamless integration with PyTorch training loops.
Description
DALIClassificationIterator is the primary integration point between a DALI preprocessing pipeline and a PyTorch training loop for image classification tasks. It wraps a fully built DALI Pipeline object and implements Python's iterator protocol, yielding batches of data as PyTorch tensors.
Each iteration returns a list containing a single dictionary with two keys:
- "data": A PyTorch tensor of shape [B, C, H, W] containing the preprocessed image batch
- "label": A PyTorch tensor of shape [B, 1] containing the integer class labels
The iterator is configured with three key parameters:
reader_name: Links to the named reader operator (fn.readers.file with name="Reader") in the pipeline. This connection enables the iterator to query the reader for total dataset size (via the _size attribute) and to manage epoch boundaries. The dataset size is essential for computing the number of iterations per epoch.
last_batch_policy: Controls behavior when the remaining samples do not fill a complete batch:
- LastBatchPolicy.PARTIAL: Returns a smaller final batch (preferred for validation to ensure all samples are evaluated)
- LastBatchPolicy.DROP: Discards the incomplete final batch
- LastBatchPolicy.FILL: Pads the final batch to full size by repeating samples
auto_reset: When True, the iterator automatically resets to the beginning of the dataset when exhausted, allowing it to be reused across training epochs without explicit reset calls.
Usage
Create the iterator after building the DALI pipeline. Use it in a standard Python for-loop. Extract data via data[0]["data"] and labels via data[0]["label"].squeeze(-1).long().
Code Reference
Source Location
- Repository: NVIDIA DALI
- File: docs/examples/use_cases/pytorch/resnet50/main.py (lines 326-331)
- File: docs/examples/use_cases/pytorch/efficientnet/image_classification/dataloaders.py (lines 186-188)
Signature (ResNet50)
train_pipe.build()
train_loader = DALIClassificationIterator(
train_pipe,
reader_name="Reader",
last_batch_policy=LastBatchPolicy.PARTIAL,
auto_reset=True,
)
Signature (EfficientNet)
train_loader = DALIClassificationIterator(
pipe, reader_name="Reader", fill_last_batch=False
)
Import
from nvidia.dali.plugin.pytorch import DALIClassificationIterator, LastBatchPolicy
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| pipe | nvidia.dali.Pipeline | Yes | A fully built DALI Pipeline object (after calling pipe.build()) |
| reader_name | str | No | Name of the reader operator in the pipeline (must match the name= parameter in fn.readers.file). Used for epoch size queries. |
| last_batch_policy | LastBatchPolicy | No | How to handle the last batch: PARTIAL (return smaller batch), DROP (discard), or FILL (pad to full size). Default varies by version. |
| auto_reset | bool | No | Automatically reset the iterator when the epoch is exhausted (default False) |
| fill_last_batch | bool | No | Deprecated parameter; use last_batch_policy instead. False is equivalent to LastBatchPolicy.PARTIAL. |
Outputs
| Name | Type | Description |
|---|---|---|
| batch | list[dict] | A single-element list containing a dictionary with "data" key (PyTorch tensor [B, C, H, W] on GPU) and "label" key (PyTorch tensor [B, 1] on GPU) |
Usage Examples
Standard Training Loop Integration
from nvidia.dali.plugin.pytorch import DALIClassificationIterator, LastBatchPolicy
# After pipeline is built:
train_loader = DALIClassificationIterator(
train_pipe,
reader_name="Reader",
last_batch_policy=LastBatchPolicy.PARTIAL,
auto_reset=True,
)
for i, data in enumerate(train_loader):
images = data[0]["data"] # [B, 3, 224, 224] float32 on GPU
labels = data[0]["label"].squeeze(-1).long() # [B] int64 on GPU
output = model(images)
loss = criterion(output, labels)
# ... backward, optimizer step, etc.
Validation with PARTIAL Last-Batch Policy
val_loader = DALIClassificationIterator(
val_pipe,
reader_name="Reader",
last_batch_policy=LastBatchPolicy.PARTIAL,
auto_reset=True,
)
# All validation samples are evaluated, including partial final batch
for data in val_loader:
images = data[0]["data"]
labels = data[0]["label"].squeeze(-1).long()
with torch.no_grad():
output = model(images)
Querying Dataset Size
# The iterator exposes the dataset size through the _size attribute:
train_loader_len = int(math.ceil(train_loader._size / batch_size))
print(f"Training iterations per epoch: {train_loader_len}")