Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:NVIDIA DALI Fn Readers File

From Leeroopedia


Knowledge Sources
Domains Data_Pipeline, File_IO, Distributed_Computing
Last Updated 2026-02-08 00:00 GMT

Overview

The fn.readers.file operator in NVIDIA DALI that reads files from disk in an ImageNet-style directory layout, producing raw encoded byte buffers and integer class labels with built-in sharding, shuffling, and epoch management.

Description

fn.readers.file is the data source operator used at the entry point of DALI pipelines for image classification tasks. It scans a root directory where each subdirectory represents a class, automatically assigns integer labels based on sorted directory names, and reads files as raw byte buffers. The operator handles all aspects of data iteration: file discovery, shuffling, sharding for distributed training, and epoch boundary management.

The operator runs on the CPU and outputs two DataNodes: the raw file contents (e.g., JPEG byte buffers) and the corresponding integer labels. These outputs are then consumed by downstream operators such as image decoders.

The name parameter is critical for integration with the DALIClassificationIterator, which uses it to query the reader for dataset size and epoch progress information.

Usage

Use this operator as the first step in a DALI pipeline to load image files from an ImageNet-style directory structure. Configure shard_id and num_shards for distributed training, enable random_shuffle for training, and set pad_last_batch=True to ensure uniform batch sizes across all workers.

Code Reference

Source Location

  • Repository: NVIDIA DALI
  • File: docs/examples/use_cases/pytorch/resnet50/main.py (lines 169-176)

Signature

images, labels = fn.readers.file(
    file_root=data_dir,
    shard_id=shard_id,
    num_shards=num_shards,
    random_shuffle=is_training,
    pad_last_batch=True,
    name="Reader",
)

Import

import nvidia.dali.fn as fn

I/O Contract

Inputs

Name Type Required Description
file_root str Yes Root directory containing class subdirectories, each holding image files (ImageNet-style layout)
shard_id int Yes Zero-based index of this worker's data shard (typically the local GPU rank)
num_shards int Yes Total number of data shards (typically the world size in distributed training)
random_shuffle bool No Whether to randomly shuffle files each epoch (default False; set True for training)
pad_last_batch bool No Whether to pad the last batch to ensure all shards produce the same number of samples (default False; set True for distributed training)
name str No Named handle for the reader, used by DALIClassificationIterator to query epoch size and progress

Outputs

Name Type Description
images DataNode (CPU) Raw encoded file contents as byte buffers (e.g., JPEG-encoded image data)
labels DataNode (CPU) Integer class labels derived from sorted subdirectory names (0-indexed)

Usage Examples

Training Reader with Sharding

import nvidia.dali.fn as fn

# Inside a @pipeline_def decorated function:
images, labels = fn.readers.file(
    file_root="/data/imagenet/train",
    shard_id=0,        # GPU rank 0
    num_shards=8,      # 8 GPUs total
    random_shuffle=True,
    pad_last_batch=True,
    name="Reader",
)

Validation Reader (No Shuffle)

images, labels = fn.readers.file(
    file_root="/data/imagenet/val",
    shard_id=local_rank,
    num_shards=world_size,
    random_shuffle=False,
    pad_last_batch=True,
    name="Reader",
)

EfficientNet Variant

# From dali.py training_pipe (lines 95-102):
jpegs, labels = fn.readers.file(
    name="Reader",
    file_root=data_dir,
    shard_id=rank,
    num_shards=world_size,
    random_shuffle=True,
    pad_last_batch=True,
)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment