Implementation:Microsoft Onnxruntime TrainingRunner Parameters
Appearance
| Field | Value |
|---|---|
| Implementation Name | TrainingRunner_Parameters |
| Overview | Configuration struct encapsulating all training hyperparameters, model paths, optimizer selection, and distributed parallelism settings. |
| Type | API Doc |
| Language | C++ |
| Domains | Distributed_Training, Training_Infrastructure |
| Source Repository | microsoft/onnxruntime |
| Last Updated | 2026-02-10 |
Overview
Configuration struct encapsulating all training hyperparameters, model paths, optimizer selection, and distributed parallelism settings. The TrainingRunner::Parameters struct is the single configuration object that drives the entire distributed training pipeline.
API
struct TrainingRunner::Parameters {
// Model and data paths
std::string model_name;
PathString model_path;
PathString train_data_dir;
PathString test_data_dir;
PathString output_dir;
// Training hyperparameters
size_t batch_size;
size_t num_train_steps;
LearningRateParameters lr_params;
int gradient_accumulation_steps = 1;
// Optimizer
std::string training_optimizer_name = "SGDOptimizer";
// Mixed precision
bool use_mixed_precision = false;
bool use_bfloat16 = false;
float loss_scale = 1.0f;
// Distributed parallelism
int data_parallel_size = 1;
int horizontal_parallel_size = 1;
int pipeline_parallel_size = 1;
bool use_nccl = false;
// Checkpointing
PathString checkpoints_dir;
size_t checkpoint_period = 0;
size_t max_num_checkpoints = 1;
// TensorBoard
PathString log_dir;
VectorString scalar_names;
VectorString histogram_names;
// ... additional fields
};
Source Code Reference
- Repository: microsoft/onnxruntime
- Primary Source: orttraining/orttraining/models/runner/training_runner.h:L23-206
Key Fields
| Field | Type | Default | Description |
|---|---|---|---|
| model_path | PathString | (required) | Path to the ONNX model file |
| train_data_dir | PathString | (required) | Directory containing training data in .pb format |
| test_data_dir | PathString | Directory containing evaluation data | |
| batch_size | size_t | (required) | Number of samples per training batch |
| num_train_steps | size_t | (required) | Total number of training steps |
| lr_params | LearningRateParameters | Learning rate configuration with feed name and schedule | |
| training_optimizer_name | string | "SGDOptimizer" | Optimizer: "Adam", "Lamb", or "SGDOptimizer" |
| gradient_accumulation_steps | int | 1 | Micro-batches per weight update |
| use_mixed_precision | bool | false | Enable FP16/BF16 mixed precision training |
| use_bfloat16 | bool | false | Use BF16 instead of FP16 for mixed precision |
| loss_scale | float | 1.0f | Static loss scale (0.0 enables dynamic scaling) |
| data_parallel_size | int | 1 | Number of data-parallel replicas |
| horizontal_parallel_size | int | 1 | Tensor model parallelism size |
| pipeline_parallel_size | int | 1 | Pipeline parallelism stages (1 = disabled) |
| use_nccl | bool | false | Enable NCCL for GPU collective communication |
| checkpoints_dir | PathString | (empty) | Directory for checkpoint files (empty = no checkpointing) |
| checkpoint_period | size_t | 0 | Steps between checkpoints (0 = no saving) |
| max_num_checkpoints | size_t | 1 | Maximum retained checkpoint files |
| log_dir | PathString | (empty) | TensorBoard log directory (empty = no TensorBoard) |
| gpu_mem_limit_in_gb | float | -1.0f | GPU memory limit (-1.0 = use all available) |
I/O Contract
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | Configuration values | Various | Model paths, hyperparameters, parallelism settings, checkpoint config |
| Output | Parameters struct | TrainingRunner::Parameters | Fully configured struct passed to TrainingRunner constructor |
Usage Examples
Basic Configuration
TrainingRunner::Parameters params;
params.model_name = "gpt2";
params.model_path = ORT_TSTR("model.onnx");
params.train_data_dir = ORT_TSTR("/data/train/");
params.test_data_dir = ORT_TSTR("/data/test/");
params.output_dir = ORT_TSTR("/output/");
params.batch_size = 32;
params.num_train_steps = 10000;
params.training_optimizer_name = "Adam";
params.gradient_accumulation_steps = 4;
Distributed Configuration with NCCL
params.data_parallel_size = 4;
params.horizontal_parallel_size = 1;
params.pipeline_parallel_size = 1;
params.use_nccl = true;
Mixed Precision with Checkpointing
params.use_mixed_precision = true;
params.loss_scale = 0.0f; // dynamic loss scaling
params.checkpoints_dir = ORT_TSTR("/checkpoints/");
params.checkpoint_period = 1000;
params.max_num_checkpoints = 5;
TensorBoard Logging
params.log_dir = ORT_TSTR("/logs/tensorboard/");
params.scalar_names = {"loss", "learning_rate"};
params.histogram_names = {"weights", "gradients"};
Key Details
- The num_train_steps must be a multiple of gradient_accumulation_steps (enforced by constructor assertion).
- DeepSpeed ZeRO partitioning (deepspeed_zero.stage != 0) requires use_nccl = true.
- The weights_to_train and weights_not_to_train sets are mutually exclusive.
- EnableTensorboard() returns true only when log_dir is set, is_perf_test is false, and the current rank is 0.
- UseCuda() checks whether a CUDA execution provider has been registered in the providers map.
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment