Implementation:Microsoft Onnxruntime TrainingRunner Parameters

Field	Value
Implementation Name	TrainingRunner_Parameters
Overview	Configuration struct encapsulating all training hyperparameters, model paths, optimizer selection, and distributed parallelism settings.
Type	API Doc
Language	C++
Domains	Distributed_Training, Training_Infrastructure
Source Repository	microsoft/onnxruntime
Last Updated	2026-02-10

Overview

Configuration struct encapsulating all training hyperparameters, model paths, optimizer selection, and distributed parallelism settings. The TrainingRunner::Parameters struct is the single configuration object that drives the entire distributed training pipeline.

API

struct TrainingRunner::Parameters {
    // Model and data paths
    std::string model_name;
    PathString model_path;
    PathString train_data_dir;
    PathString test_data_dir;
    PathString output_dir;

    // Training hyperparameters
    size_t batch_size;
    size_t num_train_steps;
    LearningRateParameters lr_params;
    int gradient_accumulation_steps = 1;

    // Optimizer
    std::string training_optimizer_name = "SGDOptimizer";

    // Mixed precision
    bool use_mixed_precision = false;
    bool use_bfloat16 = false;
    float loss_scale = 1.0f;

    // Distributed parallelism
    int data_parallel_size = 1;
    int horizontal_parallel_size = 1;
    int pipeline_parallel_size = 1;
    bool use_nccl = false;

    // Checkpointing
    PathString checkpoints_dir;
    size_t checkpoint_period = 0;
    size_t max_num_checkpoints = 1;

    // TensorBoard
    PathString log_dir;
    VectorString scalar_names;
    VectorString histogram_names;
    // ... additional fields
};

Source Code Reference

Repository: microsoft/onnxruntime
Primary Source: orttraining/orttraining/models/runner/training_runner.h:L23-206

Key Fields

Field	Type	Default	Description
model_path	PathString	(required)	Path to the ONNX model file
train_data_dir	PathString	(required)	Directory containing training data in .pb format
test_data_dir	PathString		Directory containing evaluation data
batch_size	size_t	(required)	Number of samples per training batch
num_train_steps	size_t	(required)	Total number of training steps
lr_params	LearningRateParameters		Learning rate configuration with feed name and schedule
training_optimizer_name	string	"SGDOptimizer"	Optimizer: "Adam", "Lamb", or "SGDOptimizer"
gradient_accumulation_steps	int	1	Micro-batches per weight update
use_mixed_precision	bool	false	Enable FP16/BF16 mixed precision training
use_bfloat16	bool	false	Use BF16 instead of FP16 for mixed precision
loss_scale	float	1.0f	Static loss scale (0.0 enables dynamic scaling)
data_parallel_size	int	1	Number of data-parallel replicas
horizontal_parallel_size	int	1	Tensor model parallelism size
pipeline_parallel_size	int	1	Pipeline parallelism stages (1 = disabled)
use_nccl	bool	false	Enable NCCL for GPU collective communication
checkpoints_dir	PathString	(empty)	Directory for checkpoint files (empty = no checkpointing)
checkpoint_period	size_t	0	Steps between checkpoints (0 = no saving)
max_num_checkpoints	size_t	1	Maximum retained checkpoint files
log_dir	PathString	(empty)	TensorBoard log directory (empty = no TensorBoard)
gpu_mem_limit_in_gb	float	-1.0f	GPU memory limit (-1.0 = use all available)

I/O Contract

Direction	Name	Type	Description
Input	Configuration values	Various	Model paths, hyperparameters, parallelism settings, checkpoint config
Output	Parameters struct	TrainingRunner::Parameters	Fully configured struct passed to TrainingRunner constructor

Usage Examples

Basic Configuration

TrainingRunner::Parameters params;
params.model_name = "gpt2";
params.model_path = ORT_TSTR("model.onnx");
params.train_data_dir = ORT_TSTR("/data/train/");
params.test_data_dir = ORT_TSTR("/data/test/");
params.output_dir = ORT_TSTR("/output/");

params.batch_size = 32;
params.num_train_steps = 10000;
params.training_optimizer_name = "Adam";
params.gradient_accumulation_steps = 4;

Distributed Configuration with NCCL

params.data_parallel_size = 4;
params.horizontal_parallel_size = 1;
params.pipeline_parallel_size = 1;
params.use_nccl = true;

Mixed Precision with Checkpointing

params.use_mixed_precision = true;
params.loss_scale = 0.0f;  // dynamic loss scaling

params.checkpoints_dir = ORT_TSTR("/checkpoints/");
params.checkpoint_period = 1000;
params.max_num_checkpoints = 5;

TensorBoard Logging

params.log_dir = ORT_TSTR("/logs/tensorboard/");
params.scalar_names = {"loss", "learning_rate"};
params.histogram_names = {"weights", "gradients"};

Key Details

The num_train_steps must be a multiple of gradient_accumulation_steps (enforced by constructor assertion).
DeepSpeed ZeRO partitioning (deepspeed_zero.stage != 0) requires use_nccl = true.
The weights_to_train and weights_not_to_train sets are mutually exclusive.
EnableTensorboard() returns true only when log_dir is set, is_perf_test is false, and the current rank is 0.
UseCuda() checks whether a CUDA execution provider has been registered in the providers map.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment