Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA DALI Paddle ResNet Training

From Leeroopedia


Knowledge Sources
Domains Vision, Training
Last Updated 2026-02-08 16:00 GMT

Overview

Orchestrates the full PaddlePaddle static-graph training and evaluation program for ResNet-50 with NVIDIA DALI data loading integration.

Description

This module provides the complete training program logic for the PaddlePaddle ResNet-50 example. It operates in PaddlePaddle's static graph mode (paddle.static) and contains functions for building the computational graph, compiling programs, running training/evaluation loops, and managing distributed training.

The core workflow is: (1) create_feeds creates input placeholders for image data and labels, (2) build assembles the full program by instantiating the model, creating loss/metric fetch operations, and configuring the optimizer with optional AMP/ASP support, (3) compile_prog compiles the program with operator fusion optimizations, and (4) run executes the training or evaluation loop over a DALI data iterator, collecting metrics like loss, top-1/top-5 accuracy, throughput (images/sec), and latency.

The module supports distributed training via PaddlePaddle Fleet with NCCL collective communication, automatic mixed precision (AMP) with dynamic loss scaling, automatic sparsity (ASP) with configurable mask algorithms, and benchmark mode with warmup steps. The run function integrates directly with nvidia.dali.plugin.paddle.DALIGenericIterator as its data source, demonstrating DALI's role as a high-performance data pipeline replacement.

Usage

Use this module as the main training program when running the PaddlePaddle ResNet-50 DALI example. It is called from the top-level training script after configuring arguments via the config module and setting up the DALI pipeline.

Code Reference

Source Location

Signature

def create_feeds(image_shape): ...

def create_fetchs(out, feeds, class_num, label_smoothing=0, mode=Mode.TRAIN): ...

def create_strategy(args, is_train=True): ...

def dist_optimizer(args, optimizer): ...

def build(args, main_prog, startup_prog, step_each_epoch, is_train=True): ...

def compile_prog(args, program, loss_name=None, is_train=True): ...

def run(args, dataloader, exe, program, fetchs, epoch,
        mode=Mode.TRAIN, lr_scheduler=None): ...

def log_info(step, metrics, mode): ...

Import

from program import create_feeds, create_fetchs, build, compile_prog, run

I/O Contract

Inputs (build function)

Name Type Required Description
args Namespace Yes Parsed command-line arguments containing model, optimizer, and training configuration.
main_prog paddle.static.Program Yes The main program to build the computation graph in.
startup_prog paddle.static.Program Yes The startup program for parameter initialization.
step_each_epoch int Yes Number of training steps per epoch, used for learning rate scheduling.
is_train bool No Whether to build for training (True) or evaluation (False). Default: True.

Outputs (build function)

Name Type Description
fetchs dict Dictionary mapping metric names (loss, top1, top5) to (variable, AverageMeter) tuples.
lr_scheduler paddle.optimizer.lr.LRScheduler Learning rate scheduler instance (None if is_train=False).
feeds dict Dictionary mapping feed names ('data', 'label') to static data placeholders.
optimizer Optimizer Distributed optimizer with AMP/ASP configuration (None if is_train=False).

Inputs (run function)

Name Type Required Description
args Namespace Yes Parsed command-line arguments.
dataloader DALIGenericIterator Yes NVIDIA DALI data loader iterator producing batches.
exe paddle.static.Executor Yes PaddlePaddle static executor to run the program.
program paddle.static.Program Yes Compiled program to execute.
fetchs dict Yes Fetch variables and meters from the build step.
epoch int Yes Current epoch number.
mode Mode No Training or evaluation mode. Default: Mode.TRAIN.
lr_scheduler LRScheduler No Learning rate scheduler to step per iteration. Default: None.

Outputs (run function)

Name Type Description
epoch_data dict Dictionary of epoch-level metrics including loss, epoch_time, ips, top1, top5 (eval only).

Usage Examples

Building and running a training program

import paddle
from program import build, compile_prog, run
from utils.mode import Mode

paddle.enable_static()
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()

# Build training program
fetchs, lr_scheduler, feeds, optimizer = build(
    args, main_prog, startup_prog, step_each_epoch=5005, is_train=True
)

# Compile with operator fusion
compiled_prog = compile_prog(args, main_prog, loss_name='loss', is_train=True)

# Execute training
exe = paddle.static.Executor(paddle.CUDAPlace(0))
exe.run(startup_prog)
metrics = run(args, dali_dataloader, exe, compiled_prog, fetchs,
              epoch=0, mode=Mode.TRAIN, lr_scheduler=lr_scheduler)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment