Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Microsoft Onnxruntime TrainingRunner Initialize

From Leeroopedia


Field Value
Implementation Name TrainingRunner_Initialize
Overview Construction and initialization of the distributed training runtime including model loading, session configuration, and checkpoint restoration.
Type API Doc
Language C++
Domains Distributed_Training, Training_Infrastructure
Source Repository microsoft/onnxruntime
Last Updated 2026-02-10

Overview

Construction and initialization of the distributed training runtime including model loading, session configuration, execution provider registration, and checkpoint restoration. The TrainingRunner constructor validates parameters, and Initialize() performs the complete multi-step setup sequence.

API

TrainingRunner(Parameters params, const Environment& env);
TrainingRunner(Parameters params, const Environment& env, SessionOptions session_options);
common::Status Initialize();

Source Code Reference

I/O Contract

Direction Name Type Description
Input params TrainingRunner::Parameters Complete training configuration struct
Input env const Environment& ONNX Runtime environment with logging and thread management
Input session_options SessionOptions (optional) Additional session configuration (profiling, etc.)
Output TrainingRunner initialized object Fully configured TrainingRunner ready for Run()
Output Status common::Status OK on success, error status on failure

Usage Examples

Basic Initialization

#include "orttraining/orttraining/models/runner/training_runner.h"

// Set up environment
std::unique_ptr<Environment> env;
ORT_THROW_IF_ERROR(Environment::Create(nullptr, env));

// Configure parameters
TrainingRunner::Parameters params;
params.model_path = ORT_TSTR("model.onnx");
params.training_optimizer_name = "Adam";
params.batch_size = 32;
params.num_train_steps = 10000;
params.gradient_accumulation_steps = 1;

// Create and initialize
auto runner = std::make_unique<TrainingRunner>(params, *env);
ORT_THROW_IF_ERROR(runner->Initialize());

Initialization with Custom Session Options

SessionOptions session_options;
session_options.enable_profiling = true;
session_options.profile_file_prefix = ORT_TSTR("training_profile");

auto runner = std::make_unique<TrainingRunner>(params, *env, session_options);
ORT_THROW_IF_ERROR(runner->Initialize());

Full GPT-2 Main Entry Point

int main(int argc, char* argv[]) {
    GPT2Parameters params;
    OrtParameters ort_params{logging::Severity::kWARNING, -1};
    RETURN_IF_FAIL(ParseArguments(argc, argv, params, ort_params));

    // Setup logger
    std::string default_logger_id{"Default"};
    logging::LoggingManager default_logging_manager{
        std::make_unique<logging::CLogSink>(),
        ort_params.log_severity, false,
        logging::LoggingManager::InstanceType::Default,
        &default_logger_id, ort_params.vlog_level};

    setup_training_params(params);

    // Setup environment
    std::unique_ptr<Environment> env;
    RETURN_IF_FAIL(Environment::Create(nullptr, env));

    // Start training
    RETURN_IF_FAIL(RunTraining(params, *env));

#if defined(USE_MPI)
#ifdef _WIN32
    MPIContext::shutdown_mpi();
#endif
#endif
    return 0;
}

Initialization Sequence

The Initialize() method performs the following steps in order:

  1. Model loading: Loads the ONNX model (or pre-partitioned pipeline stage based on MPI rank).
  2. Training configuration: Sets up TrainingSession::TrainingConfiguration with:
    • Distributed config (world rank, world size, parallelism dimensions)
    • Mixed precision settings (FP16/BF16, loss scaling)
    • Loss function configuration (for the last pipeline stage)
    • Optimizer configuration (Adam/Lamb/SGD, learning rate, NCCL, ZeRO)
    • TensorBoard configuration (if enabled on rank 0)
    • GIST compression configuration (if enabled)
    • Pipeline configuration (if pipeline_parallel_size > 1)
    • Graph transformer configuration (GELU approximation, recompute settings)
  3. Session configuration: Calls session_.ConfigureForTraining().
  4. Loss scaler setup: Creates dynamic or static loss scaler for mixed precision.
  5. Pipeline setup: Initializes PipelineScheduler and PipelineWorkerPool if using pipeline parallelism.
  6. Graph output override: Exposes optimizer and pipeline outputs as graph outputs.
  7. Execution provider registration: Registers CUDA and other execution providers.
  8. Profiler start: Enables profiling if configured.
  9. Session initialization: Calls session_.Initialize() to finalize the graph.
  10. Checkpoint loading: Loads the latest or specified checkpoint if checkpoints_dir is set.

Key Details

  • The constructor enforces that model_path is not empty, training_optimizer_name is not empty, and DeepSpeed ZeRO requires NCCL.
  • num_train_steps must be a multiple of gradient_accumulation_steps.
  • Pipeline partition can be done externally (via pipeline_stage_paths) or internally by ORT.
  • Loss function configuration is only applied to the last pipeline stage.
  • Only rank 0 logs TensorBoard events.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment