Implementation:Microsoft Onnxruntime TrainingRunner Run
Appearance
| Field | Value |
|---|---|
| Implementation Name | TrainingRunner_Run |
| Overview | Execution of the distributed training loop with data iteration, forward/backward passes, gradient synchronization, and periodic evaluation. |
| Type | API Doc |
| Language | C++ |
| Domains | Distributed_Training, Training_Infrastructure |
| Source Repository | microsoft/onnxruntime |
| Last Updated | 2026-02-10 |
Overview
Execution of the distributed training loop with data iteration, forward/backward passes, gradient synchronization, and periodic evaluation. TrainingRunner::Run() is the main entry point that orchestrates the full training pipeline from start to completion.
API
common::Status Run(IDataLoader* training_data_loader,
IDataLoader* test_data_loader,
const MapStringToString& mapped_dimensions = {});
Source Code Reference
- Repository: microsoft/onnxruntime
- Declaration: orttraining/orttraining/models/runner/training_runner.h:L213-214
- Implementation: orttraining/orttraining/models/runner/training_runner.cc:L94+
Key Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| training_data_loader | IDataLoader* | Yes | Data loader providing training batches, partitioned by MPI rank |
| test_data_loader | IDataLoader* | No (nullable) | Data loader for periodic evaluation; null skips evaluation |
| mapped_dimensions | MapStringToString | No | Symbolic dimension mappings (e.g., batch size) for the session |
I/O Contract
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | training_data_loader | IDataLoader* | Partitioned training data batches |
| Input | test_data_loader | IDataLoader* | Evaluation data batches (can be null) |
| Input | mapped_dimensions | MapStringToString | Symbolic dimension resolution map |
| Output | Status | common::Status | OK on success, error on failure |
| Side Effect | Model parameters | (internal) | Trained model weights updated in the session |
| Side Effect | Checkpoints | (filesystem) | Checkpoint files saved at configured intervals |
| Side Effect | TensorBoard logs | (filesystem) | Summary events written if TensorBoard is configured |
Usage Examples
Basic Training Execution
#include "orttraining/orttraining/models/runner/training_runner.h"
#include "orttraining/orttraining/models/runner/data_loader.h"
// After initialization...
auto training_data_loader = std::make_unique<DataLoader>(
params.input_name_map, params.train_data_dir,
2, world_rank, world_size);
auto test_data_loader = std::make_unique<DataLoader>(
params.input_name_map, params.test_data_dir,
2, world_rank, world_size);
// Run training
ORT_THROW_IF_ERROR(runner->Run(
training_data_loader.get(),
test_data_loader.get()));
Training with Dimension Mapping
MapStringToString mapped_dimensions;
mapped_dimensions["batch_size"] = std::to_string(params.batch_size);
ORT_THROW_IF_ERROR(runner->Run(
training_data_loader.get(),
test_data_loader.get(),
mapped_dimensions));
Training Without Evaluation
// Pass nullptr to skip evaluation
ORT_THROW_IF_ERROR(runner->Run(
training_data_loader.get(),
nullptr));
Training Loop Structure
The Run() method delegates to TrainingLoop() which iterates for num_train_steps:
Per-Step Operations
- PrepareFeedNamesAndFeeds(): Constructs the feed dictionary from the current data batch, learning rate, loss scale, and other session inputs.
- PrepareFetchNamesAndFetches(): Sets up the output names to fetch based on the session mode.
- Execution (one of):
- RunWithUpdate(): Full forward/backward pass with parameter update (ModelUpdateStep mode).
- RunWithoutUpdate(): Forward/backward pass accumulating gradients (GradientAccumulateStep mode).
- Loss logging: Displays training loss at display_loss_steps intervals.
Periodic Operations
- Evaluation: At evaluation_period intervals, runs Evaluate() on the test data loader.
- Checkpointing: At checkpoint_period intervals, calls SaveCheckpoint().
- TensorBoard: Scalar, histogram, and norm summaries are logged with training steps.
Session Modes
| Mode | Description | When Used |
|---|---|---|
| ModelUpdateStep | Forward + backward + weight update | Every gradient_accumulation_steps steps |
| GradientAccumulateStep | Forward + backward only (accumulate gradients) | Between weight update steps |
| EvaluateStep | Forward pass only | During periodic evaluation |
Key Details
- Gradient synchronization (NCCL AllReduce) occurs automatically during the weight update step.
- When pipeline parallelism is enabled, the PipelineScheduler manages micro-batch scheduling across stages.
- The CheckWorkerException() method propagates exceptions from pipeline worker threads.
- Loss scaling (for mixed precision) is managed by the LossScaler which dynamically adjusts the scale.
- The training loop tracks step_, round_, weight_update_step_count_, and training_data_set_index_ for checkpoint/resume support.
Related Pages
- Principle:Microsoft_Onnxruntime_Distributed_Training_Loop
- Implementation:Microsoft_Onnxruntime_TrainingRunner_Initialize
- Implementation:Microsoft_Onnxruntime_DataLoader_Init
- Implementation:Microsoft_Onnxruntime_Checkpoint_Save_Load
- Implementation:Microsoft_Onnxruntime_Summary_Ops
- Environment:Microsoft_Onnxruntime_Distributed_Training_Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment