Principle:Microsoft Onnxruntime Trained Model Export
| Field | Value |
|---|---|
| Principle Name | Trained_Model_Export |
| Overview | Extraction of trained parameters and export of an optimized ONNX inference model from the distributed training pipeline. |
| Category | API Doc |
| Domains | Distributed_Training, Training_Infrastructure |
| Source Repository | microsoft/onnxruntime |
| Last Updated | 2026-02-10 |
Overview
Extraction of trained parameters and export of an optimized ONNX inference model from the distributed training pipeline. After distributed training completes, the trained model must be exported in a format suitable for deployment and inference.
Description
After distributed training completes, the trained model must be exported for deployment. This uses the Module::ExportModelForInferencing API, which transforms the evaluation graph by embedding trained weights and removing training-specific nodes to produce a clean ONNX inference model.
The export process involves:
- Loading the evaluation model: The eval model (as opposed to the training model) is loaded from either a file path or an in-memory buffer.
- Embedding trained parameters: The current trained weights from the CheckpointState are embedded as initializers in the inference model, replacing the references to external parameter storage.
- Graph pruning: Training-specific nodes (gradient computation, optimizer updates, loss computation) are removed from the graph.
- Output selection: Only the specified graph_output_names are retained as model outputs, allowing selection of which inference outputs are needed.
Distributed Training Considerations
In the distributed training context, the export step has specific considerations:
- Rank 0 export: Only rank 0 (the primary process) typically performs the export after gathering the final model state. This avoids redundant I/O across all processes.
- State gathering: If using pipeline parallelism, parameters from all pipeline stages must be gathered to rank 0 before export.
- Consistent state: The export should occur after the final gradient synchronization to ensure the exported model reflects the fully trained state.
The underlying ExportModelForInferencing API is shared with the on-device training workflow, but in the distributed context the orchestration around it differs to account for multi-process coordination.
Theoretical Basis
The separation between training and inference models is a fundamental concept in deep learning deployment:
- Training graphs contain backward pass operations, optimizer nodes, loss computation, gradient accumulation, and communication operators that are unnecessary for inference.
- Inference graphs need only the forward pass with trained weights embedded, producing a smaller, faster, and more portable model.
- Weight embedding: During training, weights are mutable and stored separately. For inference, weights are frozen and embedded in the model file for self-contained deployment.
The export transformation preserves the mathematical equivalence of the forward pass while eliminating all training-specific overhead, typically resulting in a model file that is 30-50% smaller than the training model and significantly faster to load.
Usage
Model export is the final step in the distributed training pipeline:
- Complete the distributed training loop via TrainingRunner::Run().
- On rank 0, load the final checkpoint state.
- Call Module::ExportModelForInferencing() with the desired output path and output names.
- The resulting ONNX file is ready for deployment with ONNX Runtime's inference API.