Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:NVIDIA DALI Model Fit TensorFlow

From Leeroopedia


Knowledge Sources
Domains Object_Detection, GPU_Computing
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete training script pattern for EfficientDet object detection using DALI data pipelines with TensorFlow Keras model.fit(), provided by the DALI EfficientDet example.

Description

The train.py run_training function implements the complete training workflow for EfficientDet with DALI-accelerated data loading. It ties together model configuration, data pipeline creation, distribution strategy, and the Keras training loop.

The function proceeds through these stages:

  1. Configuration: Loads the EfficientDet configuration (hparams_config.get_efficientdet_config), overrides with user-specified hyperparameters, and parses the image size.
  1. Reproducibility: If a seed is provided, sets PYTHONHASHSEED, tf.random.set_seed, np.random.seed, random.seed, and enables TF_DETERMINISTIC_OPS and TF_CUDNN_DETERMINISTIC.
  1. GPU setup: Enables memory growth on all physical GPUs and sets soft device placement. For multi-GPU training, creates a tf.distribute.MirroredStrategy with the specified GPU list.
  1. Dataset creation: Calls utils.get_dataset() which instantiates EfficientDetPipeline (for DALI modes) or InputReader (for native TensorFlow mode). In multi-GPU DALI mode, datasets are created per-replica via strategy.distribute_datasets_from_function.
  1. Model and optimizer: Within strategy.scope(), constructs EfficientDetNet and compiles it with an optimizer returned by optimizers.get_optimizer, which is aware of the global batch size and total training steps.
  1. Callbacks: Configures ModelCheckpoint (saving weights per epoch) and TensorBoard (logging metrics) based on user arguments.
  1. Training: Calls model.fit() with the DALI-backed training dataset, specifying epochs, steps_per_epoch, initial_epoch (for checkpoint resumption), callbacks, and optional validation settings.
  1. Post-training: Optionally evaluates the model and saves final weights to the specified output file.

Usage

Run train.py from the command line with the desired arguments, or call run_training(args_dict) programmatically.

Code Reference

Source Location

  • Repository: NVIDIA DALI
  • File: docs/examples/use_cases/tensorflow/efficientdet/train.py

Signature

def run_training(args):
    ...

# The core training call:
model.fit(
    train_dataset,
    epochs=args.epochs,
    steps_per_epoch=args.train_steps,
    initial_epoch=initial_epoch,
    callbacks=callbacks,
    validation_data=eval_dataset if args.eval_during_training else None,
    validation_steps=args.eval_steps,
    validation_freq=args.eval_freq,
)

Import

import tensorflow as tf
import hparams_config
import utils
from model import efficientdet_net
from model.utils import optimizers

I/O Contract

Inputs

Name Type Required Description
args dict Yes Training arguments dictionary containing all configuration parameters.
args.model_name str Yes EfficientDet variant name (e.g., "efficientdet-d1").
args.epochs int Yes Total number of training epochs. Default: 300.
args.batch_size int Yes Per-replica batch size. Default: 64.
args.train_steps int Yes Number of training steps per epoch. Default: 2000.
args.input_type InputType Yes Data format: InputType.tfrecord or InputType.coco.
args.pipeline_type PipelineType Yes Pipeline backend: dali_gpu, dali_cpu, tensorflow, or synthetic.
args.multi_gpu list[int] or None No List of GPU device indices for multi-GPU training. None for single GPU.
args.seed int or None No Random seed for reproducibility. None disables seeding.
args.start_weights str or None No Path to pre-trained weights file for checkpoint resumption.
args.output_filename str No Path for saving final model weights. Default: "output.h5".
args.ckpt_dir str or None No Directory for per-epoch checkpoint saving.
args.log_dir str or None No Directory for TensorBoard logs.
args.eval_during_training bool No Whether to run validation during training. Default: False.
args.eval_after_training bool No Whether to run evaluation after training completes. Default: False.
args.eval_steps int No Number of evaluation steps. Default: 5000.
args.eval_freq int No Run evaluation every N epochs. Default: 1.

Outputs

Name Type Description
Saved weights .h5 file Final model weights saved to args.output_filename.
Checkpoints .h5 files Per-epoch checkpoints saved to args.ckpt_dir (if specified).
TensorBoard logs event files Training and evaluation metrics (if args.log_dir specified).
Evaluation metrics stdout Printed evaluation results (if args.eval_after_training is True).

Usage Examples

Command-Line Training with DALI GPU Pipeline

# From command line:
# python train.py \
#     --input_type coco \
#     --pipeline_type dali_gpu \
#     --images_path /data/coco/train2017 \
#     --annotations_path /data/coco/annotations/instances_train2017.json \
#     --model_name efficientdet-d1 \
#     --batch_size 16 \
#     --epochs 300 \
#     --train_steps 2000 \
#     --output_filename efficientdet-d1-final.h5 \
#     --ckpt_dir /checkpoints \
#     --log_dir /logs \
#     --eval_during_training \
#     --eval_freq 5

Programmatic Multi-GPU Training

from train import run_training

args = {
    "model_name": "efficientdet-d1",
    "epochs": 300,
    "batch_size": 16,
    "train_steps": 2000,
    "input_type": "coco",
    "pipeline_type": "dali_gpu",
    "images_path": "/data/coco/train2017",
    "annotations_path": "/data/coco/annotations/instances_train2017.json",
    "multi_gpu": [0, 1, 2, 3],
    "seed": 42,
    "output_filename": "efficientdet-d1-final.h5",
    "ckpt_dir": "/checkpoints",
    "log_dir": "/logs",
    "eval_during_training": True,
    "eval_after_training": True,
    "eval_steps": 5000,
    "eval_freq": 5,
    "hparams": "",
    "start_weights": None,
    "initial_epoch": 0,
    "train_file_pattern": None,
    "eval_file_pattern": None,
}

run_training(args)

Training Loop Structure

# Simplified view of the training integration pattern:
strategy = tf.distribute.MirroredStrategy(devices)

train_dataset = utils.get_dataset(args, total_batch_size, True, params, strategy)
eval_dataset = utils.get_dataset(args, num_devices, False, params, strategy)

with strategy.scope():
    model = efficientdet_net.EfficientDetNet(params=params)
    model.compile(optimizer=optimizers.get_optimizer(
        params, args.epochs, global_batch_size, args.train_steps
    ))

    model.fit(
        train_dataset,
        epochs=args.epochs,
        steps_per_epoch=args.train_steps,
        callbacks=[
            tf.keras.callbacks.ModelCheckpoint(...),
            tf.keras.callbacks.TensorBoard(...),
        ],
        validation_data=eval_dataset,
    )

model.save_weights("output.h5")

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment