Implementation:NVIDIA DALI Model Fit TensorFlow

Knowledge Sources	NVIDIA DALI
Domains	Object_Detection, GPU_Computing
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete training script pattern for EfficientDet object detection using DALI data pipelines with TensorFlow Keras model.fit(), provided by the DALI EfficientDet example.

Description

The train.py run_training function implements the complete training workflow for EfficientDet with DALI-accelerated data loading. It ties together model configuration, data pipeline creation, distribution strategy, and the Keras training loop.

The function proceeds through these stages:

Configuration: Loads the EfficientDet configuration (hparams_config.get_efficientdet_config), overrides with user-specified hyperparameters, and parses the image size.

Reproducibility: If a seed is provided, sets PYTHONHASHSEED, tf.random.set_seed, np.random.seed, random.seed, and enables TF_DETERMINISTIC_OPS and TF_CUDNN_DETERMINISTIC.

GPU setup: Enables memory growth on all physical GPUs and sets soft device placement. For multi-GPU training, creates a tf.distribute.MirroredStrategy with the specified GPU list.

Dataset creation: Calls utils.get_dataset() which instantiates EfficientDetPipeline (for DALI modes) or InputReader (for native TensorFlow mode). In multi-GPU DALI mode, datasets are created per-replica via strategy.distribute_datasets_from_function.

Model and optimizer: Within strategy.scope(), constructs EfficientDetNet and compiles it with an optimizer returned by optimizers.get_optimizer, which is aware of the global batch size and total training steps.

Callbacks: Configures ModelCheckpoint (saving weights per epoch) and TensorBoard (logging metrics) based on user arguments.

Training: Calls model.fit() with the DALI-backed training dataset, specifying epochs, steps_per_epoch, initial_epoch (for checkpoint resumption), callbacks, and optional validation settings.

Post-training: Optionally evaluates the model and saves final weights to the specified output file.

Usage

Run train.py from the command line with the desired arguments, or call run_training(args_dict) programmatically.

Code Reference

Source Location

Repository: NVIDIA DALI
File: docs/examples/use_cases/tensorflow/efficientdet/train.py

Signature

def run_training(args):
    ...

# The core training call:
model.fit(
    train_dataset,
    epochs=args.epochs,
    steps_per_epoch=args.train_steps,
    initial_epoch=initial_epoch,
    callbacks=callbacks,
    validation_data=eval_dataset if args.eval_during_training else None,
    validation_steps=args.eval_steps,
    validation_freq=args.eval_freq,
)

Import

import tensorflow as tf
import hparams_config
import utils
from model import efficientdet_net
from model.utils import optimizers

I/O Contract

Inputs

Name	Type	Required	Description
args	dict	Yes	Training arguments dictionary containing all configuration parameters.
args.model_name	str	Yes	EfficientDet variant name (e.g., "efficientdet-d1").
args.epochs	int	Yes	Total number of training epochs. Default: 300.
args.batch_size	int	Yes	Per-replica batch size. Default: 64.
args.train_steps	int	Yes	Number of training steps per epoch. Default: 2000.
args.input_type	InputType	Yes	Data format: InputType.tfrecord or InputType.coco.
args.pipeline_type	PipelineType	Yes	Pipeline backend: dali_gpu, dali_cpu, tensorflow, or synthetic.
args.multi_gpu	list[int] or None	No	List of GPU device indices for multi-GPU training. None for single GPU.
args.seed	int or None	No	Random seed for reproducibility. None disables seeding.
args.start_weights	str or None	No	Path to pre-trained weights file for checkpoint resumption.
args.output_filename	str	No	Path for saving final model weights. Default: "output.h5".
args.ckpt_dir	str or None	No	Directory for per-epoch checkpoint saving.
args.log_dir	str or None	No	Directory for TensorBoard logs.
args.eval_during_training	bool	No	Whether to run validation during training. Default: False.
args.eval_after_training	bool	No	Whether to run evaluation after training completes. Default: False.
args.eval_steps	int	No	Number of evaluation steps. Default: 5000.
args.eval_freq	int	No	Run evaluation every N epochs. Default: 1.

Outputs

Name	Type	Description
Saved weights	.h5 file	Final model weights saved to args.output_filename.
Checkpoints	.h5 files	Per-epoch checkpoints saved to args.ckpt_dir (if specified).
TensorBoard logs	event files	Training and evaluation metrics (if args.log_dir specified).
Evaluation metrics	stdout	Printed evaluation results (if args.eval_after_training is True).

Usage Examples

Command-Line Training with DALI GPU Pipeline

# From command line:
# python train.py \
#     --input_type coco \
#     --pipeline_type dali_gpu \
#     --images_path /data/coco/train2017 \
#     --annotations_path /data/coco/annotations/instances_train2017.json \
#     --model_name efficientdet-d1 \
#     --batch_size 16 \
#     --epochs 300 \
#     --train_steps 2000 \
#     --output_filename efficientdet-d1-final.h5 \
#     --ckpt_dir /checkpoints \
#     --log_dir /logs \
#     --eval_during_training \
#     --eval_freq 5

Programmatic Multi-GPU Training

from train import run_training

args = {
    "model_name": "efficientdet-d1",
    "epochs": 300,
    "batch_size": 16,
    "train_steps": 2000,
    "input_type": "coco",
    "pipeline_type": "dali_gpu",
    "images_path": "/data/coco/train2017",
    "annotations_path": "/data/coco/annotations/instances_train2017.json",
    "multi_gpu": [0, 1, 2, 3],
    "seed": 42,
    "output_filename": "efficientdet-d1-final.h5",
    "ckpt_dir": "/checkpoints",
    "log_dir": "/logs",
    "eval_during_training": True,
    "eval_after_training": True,
    "eval_steps": 5000,
    "eval_freq": 5,
    "hparams": "",
    "start_weights": None,
    "initial_epoch": 0,
    "train_file_pattern": None,
    "eval_file_pattern": None,
}

run_training(args)

Training Loop Structure

# Simplified view of the training integration pattern:
strategy = tf.distribute.MirroredStrategy(devices)

train_dataset = utils.get_dataset(args, total_batch_size, True, params, strategy)
eval_dataset = utils.get_dataset(args, num_devices, False, params, strategy)

with strategy.scope():
    model = efficientdet_net.EfficientDetNet(params=params)
    model.compile(optimizer=optimizers.get_optimizer(
        params, args.epochs, global_batch_size, args.train_steps
    ))

    model.fit(
        train_dataset,
        epochs=args.epochs,
        steps_per_epoch=args.train_steps,
        callbacks=[
            tf.keras.callbacks.ModelCheckpoint(...),
            tf.keras.callbacks.TensorBoard(...),
        ],
        validation_data=eval_dataset,
    )

model.save_weights("output.h5")

Related Pages

Implements Principle

Principle:NVIDIA_DALI_TensorFlow_Training_Integration

Requires Environment

Uses Heuristic

Heuristic:NVIDIA_DALI_Batch_Size_Tuning

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment