Implementation:NVIDIA DALI Model Fit TensorFlow
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Object_Detection, GPU_Computing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete training script pattern for EfficientDet object detection using DALI data pipelines with TensorFlow Keras model.fit(), provided by the DALI EfficientDet example.
Description
The train.py run_training function implements the complete training workflow for EfficientDet with DALI-accelerated data loading. It ties together model configuration, data pipeline creation, distribution strategy, and the Keras training loop.
The function proceeds through these stages:
- Configuration: Loads the EfficientDet configuration (hparams_config.get_efficientdet_config), overrides with user-specified hyperparameters, and parses the image size.
- Reproducibility: If a seed is provided, sets PYTHONHASHSEED, tf.random.set_seed, np.random.seed, random.seed, and enables TF_DETERMINISTIC_OPS and TF_CUDNN_DETERMINISTIC.
- GPU setup: Enables memory growth on all physical GPUs and sets soft device placement. For multi-GPU training, creates a tf.distribute.MirroredStrategy with the specified GPU list.
- Dataset creation: Calls utils.get_dataset() which instantiates EfficientDetPipeline (for DALI modes) or InputReader (for native TensorFlow mode). In multi-GPU DALI mode, datasets are created per-replica via strategy.distribute_datasets_from_function.
- Model and optimizer: Within strategy.scope(), constructs EfficientDetNet and compiles it with an optimizer returned by optimizers.get_optimizer, which is aware of the global batch size and total training steps.
- Callbacks: Configures ModelCheckpoint (saving weights per epoch) and TensorBoard (logging metrics) based on user arguments.
- Training: Calls model.fit() with the DALI-backed training dataset, specifying epochs, steps_per_epoch, initial_epoch (for checkpoint resumption), callbacks, and optional validation settings.
- Post-training: Optionally evaluates the model and saves final weights to the specified output file.
Usage
Run train.py from the command line with the desired arguments, or call run_training(args_dict) programmatically.
Code Reference
Source Location
- Repository: NVIDIA DALI
- File: docs/examples/use_cases/tensorflow/efficientdet/train.py
Signature
def run_training(args):
...
# The core training call:
model.fit(
train_dataset,
epochs=args.epochs,
steps_per_epoch=args.train_steps,
initial_epoch=initial_epoch,
callbacks=callbacks,
validation_data=eval_dataset if args.eval_during_training else None,
validation_steps=args.eval_steps,
validation_freq=args.eval_freq,
)
Import
import tensorflow as tf
import hparams_config
import utils
from model import efficientdet_net
from model.utils import optimizers
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| args | dict | Yes | Training arguments dictionary containing all configuration parameters. |
| args.model_name | str | Yes | EfficientDet variant name (e.g., "efficientdet-d1"). |
| args.epochs | int | Yes | Total number of training epochs. Default: 300. |
| args.batch_size | int | Yes | Per-replica batch size. Default: 64. |
| args.train_steps | int | Yes | Number of training steps per epoch. Default: 2000. |
| args.input_type | InputType | Yes | Data format: InputType.tfrecord or InputType.coco. |
| args.pipeline_type | PipelineType | Yes | Pipeline backend: dali_gpu, dali_cpu, tensorflow, or synthetic. |
| args.multi_gpu | list[int] or None | No | List of GPU device indices for multi-GPU training. None for single GPU. |
| args.seed | int or None | No | Random seed for reproducibility. None disables seeding. |
| args.start_weights | str or None | No | Path to pre-trained weights file for checkpoint resumption. |
| args.output_filename | str | No | Path for saving final model weights. Default: "output.h5". |
| args.ckpt_dir | str or None | No | Directory for per-epoch checkpoint saving. |
| args.log_dir | str or None | No | Directory for TensorBoard logs. |
| args.eval_during_training | bool | No | Whether to run validation during training. Default: False. |
| args.eval_after_training | bool | No | Whether to run evaluation after training completes. Default: False. |
| args.eval_steps | int | No | Number of evaluation steps. Default: 5000. |
| args.eval_freq | int | No | Run evaluation every N epochs. Default: 1. |
Outputs
| Name | Type | Description |
|---|---|---|
| Saved weights | .h5 file | Final model weights saved to args.output_filename. |
| Checkpoints | .h5 files | Per-epoch checkpoints saved to args.ckpt_dir (if specified). |
| TensorBoard logs | event files | Training and evaluation metrics (if args.log_dir specified). |
| Evaluation metrics | stdout | Printed evaluation results (if args.eval_after_training is True). |
Usage Examples
Command-Line Training with DALI GPU Pipeline
# From command line:
# python train.py \
# --input_type coco \
# --pipeline_type dali_gpu \
# --images_path /data/coco/train2017 \
# --annotations_path /data/coco/annotations/instances_train2017.json \
# --model_name efficientdet-d1 \
# --batch_size 16 \
# --epochs 300 \
# --train_steps 2000 \
# --output_filename efficientdet-d1-final.h5 \
# --ckpt_dir /checkpoints \
# --log_dir /logs \
# --eval_during_training \
# --eval_freq 5
Programmatic Multi-GPU Training
from train import run_training
args = {
"model_name": "efficientdet-d1",
"epochs": 300,
"batch_size": 16,
"train_steps": 2000,
"input_type": "coco",
"pipeline_type": "dali_gpu",
"images_path": "/data/coco/train2017",
"annotations_path": "/data/coco/annotations/instances_train2017.json",
"multi_gpu": [0, 1, 2, 3],
"seed": 42,
"output_filename": "efficientdet-d1-final.h5",
"ckpt_dir": "/checkpoints",
"log_dir": "/logs",
"eval_during_training": True,
"eval_after_training": True,
"eval_steps": 5000,
"eval_freq": 5,
"hparams": "",
"start_weights": None,
"initial_epoch": 0,
"train_file_pattern": None,
"eval_file_pattern": None,
}
run_training(args)
Training Loop Structure
# Simplified view of the training integration pattern:
strategy = tf.distribute.MirroredStrategy(devices)
train_dataset = utils.get_dataset(args, total_batch_size, True, params, strategy)
eval_dataset = utils.get_dataset(args, num_devices, False, params, strategy)
with strategy.scope():
model = efficientdet_net.EfficientDetNet(params=params)
model.compile(optimizer=optimizers.get_optimizer(
params, args.epochs, global_batch_size, args.train_steps
))
model.fit(
train_dataset,
epochs=args.epochs,
steps_per_epoch=args.train_steps,
callbacks=[
tf.keras.callbacks.ModelCheckpoint(...),
tf.keras.callbacks.TensorBoard(...),
],
validation_data=eval_dataset,
)
model.save_weights("output.h5")
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment