Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Microsoft Onnxruntime Trained Model Export

From Leeroopedia


Field Value
Principle Name Trained_Model_Export
Overview Extraction of trained parameters and export of an optimized ONNX inference model from the distributed training pipeline.
Category API Doc
Domains Distributed_Training, Training_Infrastructure
Source Repository microsoft/onnxruntime
Last Updated 2026-02-10

Overview

Extraction of trained parameters and export of an optimized ONNX inference model from the distributed training pipeline. After distributed training completes, the trained model must be exported in a format suitable for deployment and inference.

Description

After distributed training completes, the trained model must be exported for deployment. This uses the Module::ExportModelForInferencing API, which transforms the evaluation graph by embedding trained weights and removing training-specific nodes to produce a clean ONNX inference model.

The export process involves:

  1. Loading the evaluation model: The eval model (as opposed to the training model) is loaded from either a file path or an in-memory buffer.
  2. Embedding trained parameters: The current trained weights from the CheckpointState are embedded as initializers in the inference model, replacing the references to external parameter storage.
  3. Graph pruning: Training-specific nodes (gradient computation, optimizer updates, loss computation) are removed from the graph.
  4. Output selection: Only the specified graph_output_names are retained as model outputs, allowing selection of which inference outputs are needed.

Distributed Training Considerations

In the distributed training context, the export step has specific considerations:

  • Rank 0 export: Only rank 0 (the primary process) typically performs the export after gathering the final model state. This avoids redundant I/O across all processes.
  • State gathering: If using pipeline parallelism, parameters from all pipeline stages must be gathered to rank 0 before export.
  • Consistent state: The export should occur after the final gradient synchronization to ensure the exported model reflects the fully trained state.

The underlying ExportModelForInferencing API is shared with the on-device training workflow, but in the distributed context the orchestration around it differs to account for multi-process coordination.

Theoretical Basis

The separation between training and inference models is a fundamental concept in deep learning deployment:

  • Training graphs contain backward pass operations, optimizer nodes, loss computation, gradient accumulation, and communication operators that are unnecessary for inference.
  • Inference graphs need only the forward pass with trained weights embedded, producing a smaller, faster, and more portable model.
  • Weight embedding: During training, weights are mutable and stored separately. For inference, weights are frozen and embedded in the model file for self-contained deployment.

The export transformation preserves the mathematical equivalence of the forward pass while eliminating all training-specific overhead, typically resulting in a model file that is 30-50% smaller than the training model and significantly faster to load.

Usage

Model export is the final step in the distributed training pipeline:

  1. Complete the distributed training loop via TrainingRunner::Run().
  2. On rank 0, load the final checkpoint state.
  3. Call Module::ExportModelForInferencing() with the desired output path and output names.
  4. The resulting ONNX file is ready for deployment with ONNX Runtime's inference API.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment