Principle:Microsoft Onnxruntime Trained Model Export

Field	Value
Principle Name	Trained_Model_Export
Overview	Extraction of trained parameters and export of an optimized ONNX inference model from the distributed training pipeline.
Category	API Doc
Domains	Distributed_Training, Training_Infrastructure
Source Repository	microsoft/onnxruntime
Last Updated	2026-02-10

Overview

Extraction of trained parameters and export of an optimized ONNX inference model from the distributed training pipeline. After distributed training completes, the trained model must be exported in a format suitable for deployment and inference.

Description

After distributed training completes, the trained model must be exported for deployment. This uses the Module::ExportModelForInferencing API, which transforms the evaluation graph by embedding trained weights and removing training-specific nodes to produce a clean ONNX inference model.

The export process involves:

Loading the evaluation model: The eval model (as opposed to the training model) is loaded from either a file path or an in-memory buffer.
Embedding trained parameters: The current trained weights from the CheckpointState are embedded as initializers in the inference model, replacing the references to external parameter storage.
Graph pruning: Training-specific nodes (gradient computation, optimizer updates, loss computation) are removed from the graph.
Output selection: Only the specified graph_output_names are retained as model outputs, allowing selection of which inference outputs are needed.

Distributed Training Considerations

In the distributed training context, the export step has specific considerations:

Rank 0 export: Only rank 0 (the primary process) typically performs the export after gathering the final model state. This avoids redundant I/O across all processes.
State gathering: If using pipeline parallelism, parameters from all pipeline stages must be gathered to rank 0 before export.
Consistent state: The export should occur after the final gradient synchronization to ensure the exported model reflects the fully trained state.

The underlying ExportModelForInferencing API is shared with the on-device training workflow, but in the distributed context the orchestration around it differs to account for multi-process coordination.

Theoretical Basis

The separation between training and inference models is a fundamental concept in deep learning deployment:

Training graphs contain backward pass operations, optimizer nodes, loss computation, gradient accumulation, and communication operators that are unnecessary for inference.
Inference graphs need only the forward pass with trained weights embedded, producing a smaller, faster, and more portable model.
Weight embedding: During training, weights are mutable and stored separately. For inference, weights are frozen and embedded in the model file for self-contained deployment.

The export transformation preserves the mathematical equivalence of the forward pass while eliminating all training-specific overhead, typically resulting in a model file that is 30-50% smaller than the training model and significantly faster to load.

Usage

Model export is the final step in the distributed training pipeline:

Complete the distributed training loop via TrainingRunner::Run().
On rank 0, load the final checkpoint state.
Call Module::ExportModelForInferencing() with the desired output path and output names.
The resulting ONNX file is ready for deployment with ONNX Runtime's inference API.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment