Implementation:NVIDIA DALI Fn Readers File

Knowledge Sources	NVIDIA DALI DALI fn.readers.file
Domains	Data_Pipeline, File_IO, Distributed_Computing
Last Updated	2026-02-08 00:00 GMT

Overview

The fn.readers.file operator in NVIDIA DALI that reads files from disk in an ImageNet-style directory layout, producing raw encoded byte buffers and integer class labels with built-in sharding, shuffling, and epoch management.

Description

fn.readers.file is the data source operator used at the entry point of DALI pipelines for image classification tasks. It scans a root directory where each subdirectory represents a class, automatically assigns integer labels based on sorted directory names, and reads files as raw byte buffers. The operator handles all aspects of data iteration: file discovery, shuffling, sharding for distributed training, and epoch boundary management.

The operator runs on the CPU and outputs two DataNodes: the raw file contents (e.g., JPEG byte buffers) and the corresponding integer labels. These outputs are then consumed by downstream operators such as image decoders.

The name parameter is critical for integration with the DALIClassificationIterator, which uses it to query the reader for dataset size and epoch progress information.

Usage

Use this operator as the first step in a DALI pipeline to load image files from an ImageNet-style directory structure. Configure shard_id and num_shards for distributed training, enable random_shuffle for training, and set pad_last_batch=True to ensure uniform batch sizes across all workers.

Code Reference

Source Location

Repository: NVIDIA DALI
File: docs/examples/use_cases/pytorch/resnet50/main.py (lines 169-176)

Signature

images, labels = fn.readers.file(
    file_root=data_dir,
    shard_id=shard_id,
    num_shards=num_shards,
    random_shuffle=is_training,
    pad_last_batch=True,
    name="Reader",
)

Import

import nvidia.dali.fn as fn

I/O Contract

Inputs

Name	Type	Required	Description
file_root	str	Yes	Root directory containing class subdirectories, each holding image files (ImageNet-style layout)
shard_id	int	Yes	Zero-based index of this worker's data shard (typically the local GPU rank)
num_shards	int	Yes	Total number of data shards (typically the world size in distributed training)
random_shuffle	bool	No	Whether to randomly shuffle files each epoch (default False; set True for training)
pad_last_batch	bool	No	Whether to pad the last batch to ensure all shards produce the same number of samples (default False; set True for distributed training)
name	str	No	Named handle for the reader, used by DALIClassificationIterator to query epoch size and progress

Outputs

Name	Type	Description
images	DataNode (CPU)	Raw encoded file contents as byte buffers (e.g., JPEG-encoded image data)
labels	DataNode (CPU)	Integer class labels derived from sorted subdirectory names (0-indexed)

Usage Examples

Training Reader with Sharding

import nvidia.dali.fn as fn

# Inside a @pipeline_def decorated function:
images, labels = fn.readers.file(
    file_root="/data/imagenet/train",
    shard_id=0,        # GPU rank 0
    num_shards=8,      # 8 GPUs total
    random_shuffle=True,
    pad_last_batch=True,
    name="Reader",
)

Validation Reader (No Shuffle)

images, labels = fn.readers.file(
    file_root="/data/imagenet/val",
    shard_id=local_rank,
    num_shards=world_size,
    random_shuffle=False,
    pad_last_batch=True,
    name="Reader",
)

EfficientNet Variant

# From dali.py training_pipe (lines 95-102):
jpegs, labels = fn.readers.file(
    name="Reader",
    file_root=data_dir,
    shard_id=rank,
    num_shards=world_size,
    random_shuffle=True,
    pad_last_batch=True,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment