Principle:Microsoft Onnxruntime Training Component Assembly

Overview

Assembly of the core training components (Module, Optimizer, LR Scheduler) from artifacts and checkpoint state.

Metadata

Field	Value
Principle Name	Training_Component_Assembly
Category	API Doc
Domain	On_Device_Training, Training_Infrastructure
Repository	microsoft/onnxruntime
Source Reference	`orttraining/orttraining/training_api/module.cc:L280-285` (Module), `orttraining/orttraining/training_api/optimizer.cc:L183-188` (Optimizer), `orttraining/orttraining/training_api/lr_scheduler.h:L74-81` (LinearLRScheduler)
Last Updated	2026-02-10

Description

Training component assembly creates the Module (forward/backward execution), Optimizer (parameter updates), and optional LR Scheduler from the training artifacts and loaded checkpoint state. These components are wired together to form the complete training pipeline.

The three core components are:

Module -- Manages the training and evaluation ONNX sessions. It loads the training model graph and optionally the evaluation model graph, and holds a reference to the checkpoint state for parameter access. During initialization, if parameter devices do not match the target device (extracted from node placement), tensors are re-created on the correct device.
Optimizer -- Loads the optimizer ONNX model and initializes or restores optimizer states (momentum buffers). It references the same checkpoint state as the Module, ensuring parameter updates are correctly applied. The optimizer automatically detects the algorithm type (AdamW or SGD) from the loaded graph.
LinearLRScheduler -- Computes a linearly decaying learning rate with optional warmup. It wraps the Optimizer and adjusts its learning rate at each step based on the current step count relative to the warmup and total step counts.

The Module does not own the parameters; it holds a weak reference to the CheckpointState. Similarly, the Optimizer does not own parameters but constructs tensor sequence inputs referencing them. This shared-state architecture ensures that parameter updates made by the Optimizer are immediately visible to the Module.

Theoretical Basis

The training pipeline follows the standard deep learning pattern: Module computes forward pass and gradients, Optimizer updates parameters using gradients, and LR Scheduler adjusts learning rate over time.

Module as Computation Engine -- The Module encapsulates both the forward graph (for computing loss) and the backward graph (for computing gradients via automatic differentiation). The training session runs both forward and backward in a single TrainStep call.
Optimizer as Parameter Updater -- The optimizer implements a parameter update rule. For AdamW, this involves maintaining exponentially decaying averages of past gradients (first moment) and past squared gradients (second moment), then using these to compute adaptive learning rates per parameter.
Learning Rate Scheduling -- The LinearLRScheduler computes a multiplicative factor based on the current step: during warmup, the factor linearly increases from 0 to 1; after warmup, it linearly decreases from 1 to 0 over the remaining steps. The actual learning rate is factor * initial_lr.

Usage

Assembly follows checkpoint loading and precedes the training loop:

from onnxruntime.training.api import CheckpointState, Module, Optimizer

# Load checkpoint
state = CheckpointState.load_checkpoint("output_artifacts/checkpoint")

# Create the module with training and eval models
module = Module(
    "output_artifacts/training_model.onnx",
    state,
    "output_artifacts/eval_model.onnx",
    device="cpu",
)

# Create the optimizer
optimizer = Optimizer("output_artifacts/optimizer_model.onnx", module)

Implemented By

Implementation:Microsoft_Onnxruntime_Module_Optimizer_Scheduler_Init

Related Pages

Checkpoint Loading -- Provides the CheckpointState used during assembly
On-Device Training Loop -- Uses the assembled components for training
Training Artifact Generation -- Produces the artifact files consumed during assembly

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment