Implementation:NVIDIA DALI Paddle ResNet Training

Knowledge Sources	NVIDIA_DALI
Domains	Vision, Training
Last Updated	2026-02-08 16:00 GMT

Overview

Orchestrates the full PaddlePaddle static-graph training and evaluation program for ResNet-50 with NVIDIA DALI data loading integration.

Description

This module provides the complete training program logic for the PaddlePaddle ResNet-50 example. It operates in PaddlePaddle's static graph mode (paddle.static) and contains functions for building the computational graph, compiling programs, running training/evaluation loops, and managing distributed training.

The core workflow is: (1) create_feeds creates input placeholders for image data and labels, (2) build assembles the full program by instantiating the model, creating loss/metric fetch operations, and configuring the optimizer with optional AMP/ASP support, (3) compile_prog compiles the program with operator fusion optimizations, and (4) run executes the training or evaluation loop over a DALI data iterator, collecting metrics like loss, top-1/top-5 accuracy, throughput (images/sec), and latency.

The module supports distributed training via PaddlePaddle Fleet with NCCL collective communication, automatic mixed precision (AMP) with dynamic loss scaling, automatic sparsity (ASP) with configurable mask algorithms, and benchmark mode with warmup steps. The run function integrates directly with nvidia.dali.plugin.paddle.DALIGenericIterator as its data source, demonstrating DALI's role as a high-performance data pipeline replacement.

Usage

Use this module as the main training program when running the PaddlePaddle ResNet-50 DALI example. It is called from the top-level training script after configuring arguments via the config module and setting up the DALI pipeline.

Code Reference

Source Location

Repository: NVIDIA_DALI
File: docs/examples/use_cases/paddle/resnet50/program.py
Lines: 1-447

Signature

def create_feeds(image_shape): ...

def create_fetchs(out, feeds, class_num, label_smoothing=0, mode=Mode.TRAIN): ...

def create_strategy(args, is_train=True): ...

def dist_optimizer(args, optimizer): ...

def build(args, main_prog, startup_prog, step_each_epoch, is_train=True): ...

def compile_prog(args, program, loss_name=None, is_train=True): ...

def run(args, dataloader, exe, program, fetchs, epoch,
        mode=Mode.TRAIN, lr_scheduler=None): ...

def log_info(step, metrics, mode): ...

Import

from program import create_feeds, create_fetchs, build, compile_prog, run

I/O Contract

Inputs (build function)

Name	Type	Required	Description
args	Namespace	Yes	Parsed command-line arguments containing model, optimizer, and training configuration.
main_prog	paddle.static.Program	Yes	The main program to build the computation graph in.
startup_prog	paddle.static.Program	Yes	The startup program for parameter initialization.
step_each_epoch	int	Yes	Number of training steps per epoch, used for learning rate scheduling.
is_train	bool	No	Whether to build for training (True) or evaluation (False). Default: True.

Outputs (build function)

Name	Type	Description
fetchs	dict	Dictionary mapping metric names (loss, top1, top5) to (variable, AverageMeter) tuples.
lr_scheduler	paddle.optimizer.lr.LRScheduler	Learning rate scheduler instance (None if is_train=False).
feeds	dict	Dictionary mapping feed names ('data', 'label') to static data placeholders.
optimizer	Optimizer	Distributed optimizer with AMP/ASP configuration (None if is_train=False).

Inputs (run function)

Name	Type	Required	Description
args	Namespace	Yes	Parsed command-line arguments.
dataloader	DALIGenericIterator	Yes	NVIDIA DALI data loader iterator producing batches.
exe	paddle.static.Executor	Yes	PaddlePaddle static executor to run the program.
program	paddle.static.Program	Yes	Compiled program to execute.
fetchs	dict	Yes	Fetch variables and meters from the build step.
epoch	int	Yes	Current epoch number.
mode	Mode	No	Training or evaluation mode. Default: Mode.TRAIN.
lr_scheduler	LRScheduler	No	Learning rate scheduler to step per iteration. Default: None.

Outputs (run function)

Name	Type	Description
epoch_data	dict	Dictionary of epoch-level metrics including loss, epoch_time, ips, top1, top5 (eval only).

Usage Examples

Building and running a training program

import paddle
from program import build, compile_prog, run
from utils.mode import Mode

paddle.enable_static()
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()

# Build training program
fetchs, lr_scheduler, feeds, optimizer = build(
    args, main_prog, startup_prog, step_each_epoch=5005, is_train=True
)

# Compile with operator fusion
compiled_prog = compile_prog(args, main_prog, loss_name='loss', is_train=True)

# Execute training
exe = paddle.static.Executor(paddle.CUDAPlace(0))
exe.run(startup_prog)
metrics = run(args, dali_dataloader, exe, compiled_prog, fetchs,
              epoch=0, mode=Mode.TRAIN, lr_scheduler=lr_scheduler)

Related Pages

Environment:NVIDIA_DALI_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment