Implementation:Microsoft Onnxruntime TrainingRunner Initialize
Appearance
| Field | Value |
|---|---|
| Implementation Name | TrainingRunner_Initialize |
| Overview | Construction and initialization of the distributed training runtime including model loading, session configuration, and checkpoint restoration. |
| Type | API Doc |
| Language | C++ |
| Domains | Distributed_Training, Training_Infrastructure |
| Source Repository | microsoft/onnxruntime |
| Last Updated | 2026-02-10 |
Overview
Construction and initialization of the distributed training runtime including model loading, session configuration, execution provider registration, and checkpoint restoration. The TrainingRunner constructor validates parameters, and Initialize() performs the complete multi-step setup sequence.
API
TrainingRunner(Parameters params, const Environment& env);
TrainingRunner(Parameters params, const Environment& env, SessionOptions session_options);
common::Status Initialize();
Source Code Reference
- Repository: microsoft/onnxruntime
- Constructor: orttraining/orttraining/models/runner/training_runner.cc:L74-92
- Initialize(): orttraining/orttraining/models/runner/training_runner.cc:L94-299
- Main entry point (GPT-2): orttraining/orttraining/models/gpt2/main.cc:L472-495
I/O Contract
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | params | TrainingRunner::Parameters | Complete training configuration struct |
| Input | env | const Environment& | ONNX Runtime environment with logging and thread management |
| Input | session_options | SessionOptions (optional) | Additional session configuration (profiling, etc.) |
| Output | TrainingRunner | initialized object | Fully configured TrainingRunner ready for Run() |
| Output | Status | common::Status | OK on success, error status on failure |
Usage Examples
Basic Initialization
#include "orttraining/orttraining/models/runner/training_runner.h"
// Set up environment
std::unique_ptr<Environment> env;
ORT_THROW_IF_ERROR(Environment::Create(nullptr, env));
// Configure parameters
TrainingRunner::Parameters params;
params.model_path = ORT_TSTR("model.onnx");
params.training_optimizer_name = "Adam";
params.batch_size = 32;
params.num_train_steps = 10000;
params.gradient_accumulation_steps = 1;
// Create and initialize
auto runner = std::make_unique<TrainingRunner>(params, *env);
ORT_THROW_IF_ERROR(runner->Initialize());
Initialization with Custom Session Options
SessionOptions session_options;
session_options.enable_profiling = true;
session_options.profile_file_prefix = ORT_TSTR("training_profile");
auto runner = std::make_unique<TrainingRunner>(params, *env, session_options);
ORT_THROW_IF_ERROR(runner->Initialize());
Full GPT-2 Main Entry Point
int main(int argc, char* argv[]) {
GPT2Parameters params;
OrtParameters ort_params{logging::Severity::kWARNING, -1};
RETURN_IF_FAIL(ParseArguments(argc, argv, params, ort_params));
// Setup logger
std::string default_logger_id{"Default"};
logging::LoggingManager default_logging_manager{
std::make_unique<logging::CLogSink>(),
ort_params.log_severity, false,
logging::LoggingManager::InstanceType::Default,
&default_logger_id, ort_params.vlog_level};
setup_training_params(params);
// Setup environment
std::unique_ptr<Environment> env;
RETURN_IF_FAIL(Environment::Create(nullptr, env));
// Start training
RETURN_IF_FAIL(RunTraining(params, *env));
#if defined(USE_MPI)
#ifdef _WIN32
MPIContext::shutdown_mpi();
#endif
#endif
return 0;
}
Initialization Sequence
The Initialize() method performs the following steps in order:
- Model loading: Loads the ONNX model (or pre-partitioned pipeline stage based on MPI rank).
- Training configuration: Sets up TrainingSession::TrainingConfiguration with:
- Distributed config (world rank, world size, parallelism dimensions)
- Mixed precision settings (FP16/BF16, loss scaling)
- Loss function configuration (for the last pipeline stage)
- Optimizer configuration (Adam/Lamb/SGD, learning rate, NCCL, ZeRO)
- TensorBoard configuration (if enabled on rank 0)
- GIST compression configuration (if enabled)
- Pipeline configuration (if pipeline_parallel_size > 1)
- Graph transformer configuration (GELU approximation, recompute settings)
- Session configuration: Calls session_.ConfigureForTraining().
- Loss scaler setup: Creates dynamic or static loss scaler for mixed precision.
- Pipeline setup: Initializes PipelineScheduler and PipelineWorkerPool if using pipeline parallelism.
- Graph output override: Exposes optimizer and pipeline outputs as graph outputs.
- Execution provider registration: Registers CUDA and other execution providers.
- Profiler start: Enables profiling if configured.
- Session initialization: Calls session_.Initialize() to finalize the graph.
- Checkpoint loading: Loads the latest or specified checkpoint if checkpoints_dir is set.
Key Details
- The constructor enforces that model_path is not empty, training_optimizer_name is not empty, and DeepSpeed ZeRO requires NCCL.
- num_train_steps must be a multiple of gradient_accumulation_steps.
- Pipeline partition can be done externally (via pipeline_stage_paths) or internally by ORT.
- Loss function configuration is only applied to the last pipeline stage.
- Only rank 0 logs TensorBoard events.
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment