Implementation:NVIDIA DALI Fn Readers File
| Knowledge Sources | |
|---|---|
| Domains | Data_Pipeline, File_IO, Distributed_Computing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
The fn.readers.file operator in NVIDIA DALI that reads files from disk in an ImageNet-style directory layout, producing raw encoded byte buffers and integer class labels with built-in sharding, shuffling, and epoch management.
Description
fn.readers.file is the data source operator used at the entry point of DALI pipelines for image classification tasks. It scans a root directory where each subdirectory represents a class, automatically assigns integer labels based on sorted directory names, and reads files as raw byte buffers. The operator handles all aspects of data iteration: file discovery, shuffling, sharding for distributed training, and epoch boundary management.
The operator runs on the CPU and outputs two DataNodes: the raw file contents (e.g., JPEG byte buffers) and the corresponding integer labels. These outputs are then consumed by downstream operators such as image decoders.
The name parameter is critical for integration with the DALIClassificationIterator, which uses it to query the reader for dataset size and epoch progress information.
Usage
Use this operator as the first step in a DALI pipeline to load image files from an ImageNet-style directory structure. Configure shard_id and num_shards for distributed training, enable random_shuffle for training, and set pad_last_batch=True to ensure uniform batch sizes across all workers.
Code Reference
Source Location
- Repository: NVIDIA DALI
- File: docs/examples/use_cases/pytorch/resnet50/main.py (lines 169-176)
Signature
images, labels = fn.readers.file(
file_root=data_dir,
shard_id=shard_id,
num_shards=num_shards,
random_shuffle=is_training,
pad_last_batch=True,
name="Reader",
)
Import
import nvidia.dali.fn as fn
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| file_root | str | Yes | Root directory containing class subdirectories, each holding image files (ImageNet-style layout) |
| shard_id | int | Yes | Zero-based index of this worker's data shard (typically the local GPU rank) |
| num_shards | int | Yes | Total number of data shards (typically the world size in distributed training) |
| random_shuffle | bool | No | Whether to randomly shuffle files each epoch (default False; set True for training) |
| pad_last_batch | bool | No | Whether to pad the last batch to ensure all shards produce the same number of samples (default False; set True for distributed training) |
| name | str | No | Named handle for the reader, used by DALIClassificationIterator to query epoch size and progress |
Outputs
| Name | Type | Description |
|---|---|---|
| images | DataNode (CPU) | Raw encoded file contents as byte buffers (e.g., JPEG-encoded image data) |
| labels | DataNode (CPU) | Integer class labels derived from sorted subdirectory names (0-indexed) |
Usage Examples
Training Reader with Sharding
import nvidia.dali.fn as fn
# Inside a @pipeline_def decorated function:
images, labels = fn.readers.file(
file_root="/data/imagenet/train",
shard_id=0, # GPU rank 0
num_shards=8, # 8 GPUs total
random_shuffle=True,
pad_last_batch=True,
name="Reader",
)
Validation Reader (No Shuffle)
images, labels = fn.readers.file(
file_root="/data/imagenet/val",
shard_id=local_rank,
num_shards=world_size,
random_shuffle=False,
pad_last_batch=True,
name="Reader",
)
EfficientNet Variant
# From dali.py training_pipe (lines 95-102):
jpegs, labels = fn.readers.file(
name="Reader",
file_root=data_dir,
shard_id=rank,
num_shards=world_size,
random_shuffle=True,
pad_last_batch=True,
)