Principle:Microsoft Onnxruntime Training Artifact Generation
Overview
Generation of required training artifacts (training model, eval model, optimizer model, checkpoint) from a forward-only ONNX model.
Metadata
| Field | Value |
|---|---|
| Principle Name | Training_Artifact_Generation |
| Category | API Doc |
| Domain | On_Device_Training, Model_Optimization |
| Repository | microsoft/onnxruntime |
| Source Reference | docs/python/on_device_training/training_artifacts.rst:L23-40
|
| Last Updated | 2026-02-10 |
Description
The artifact generation step takes a forward-only ONNX model and produces all files needed for on-device training: a training model with backward graph, an evaluation model, an optimizer model, and an initial checkpoint. This automates the complex process of adding loss computation, gradient calculation, and optimizer logic to the forward graph.
The four generated artifacts serve distinct roles:
- training_model.onnx -- Contains the forward graph augmented with a loss function and the corresponding backward graph for gradient computation. This is the model used during the
TrainStepcall. - eval_model.onnx -- Contains the forward graph augmented with the loss function but without the backward graph. Used during
EvalStepto compute metrics without modifying parameters. - optimizer_model.onnx -- Encodes the optimizer's parameter update rules (e.g., AdamW momentum updates). Executed during
OptimizerStepto update parameters using computed gradients. - checkpoint -- A flatbuffers-encoded file containing initial parameter values and optimizer state. Serves as the starting point for training.
The requires_grad and frozen_params lists control which parameters participate in gradient computation. Parameters listed in requires_grad will have gradients computed and will be updated by the optimizer. Parameters listed in frozen_params are included in the model but their values remain fixed during training.
Theoretical Basis
Automatic differentiation is applied to the forward ONNX graph to create a backward graph. The loss function is appended to the forward graph, and an optimizer model encodes the parameter update rules.
The artifact generation process applies the following transformations:
- Loss Function Insertion -- The specified loss function (e.g., CrossEntropyLoss, MSELoss) is appended to the forward graph's output. This converts the model's raw predictions into a scalar loss value.
- Automatic Differentiation -- The backward graph is derived by applying the chain rule to each operation in the forward graph, in reverse topological order. Each ONNX operator has a corresponding gradient implementation.
- Optimizer Graph Construction -- A separate ONNX graph is created that encodes the optimizer's update rule (e.g., for AdamW: maintaining first and second moment estimates, applying weight decay, and computing parameter updates).
- Checkpoint Initialization -- All parameters are serialized along with zero-initialized optimizer states (e.g., zero-initialized momentum buffers for AdamW).
Usage
The standard usage requires a forward-only ONNX model (typically exported from PyTorch) and specification of which parameters need gradients:
from onnxruntime.training import artifacts
import onnx
# Load the forward-only ONNX model
model = onnx.load("model.onnx")
# Generate all training artifacts
artifacts.generate_artifacts(
model,
requires_grad=["parameters", "needing", "gradients"],
frozen_params=["parameters", "not", "needing", "gradients"],
loss=artifacts.LossType.CrossEntropyLoss,
optimizer=artifacts.OptimType.AdamW,
artifact_directory="output_artifacts/",
)
Custom loss functions can be created by extending onnxruntime.training.onnxblock.Block and passing the instance as the loss parameter instead of a LossType enum value.
Implemented By
Implementation:Microsoft_Onnxruntime_Generate_Artifacts
Related Pages
- PyTorch Model Export -- The preceding step that produces the forward-only ONNX model
- Checkpoint Loading -- Loading the generated checkpoint for training
- Training Component Assembly -- Assembling the generated artifacts into training components