Implementation:Microsoft Onnxruntime ExportModelForInferencing Distributed
Appearance
| Field | Value |
|---|---|
| Implementation Name | ExportModelForInferencing_Distributed |
| Overview | Export of a trained ONNX model for inference deployment from the distributed training pipeline. |
| Type | API Doc |
| Language | C++ |
| Domains | Distributed_Training, Training_Infrastructure |
| Source Repository | microsoft/onnxruntime |
| Last Updated | 2026-02-10 |
Overview
Export of a trained ONNX model for inference deployment from the distributed training pipeline. Module::ExportModelForInferencing transforms the evaluation graph by embedding trained weights and removing training-specific nodes, producing a clean ONNX model suitable for inference.
API
Status Module::ExportModelForInferencing(
const std::string& inference_model_path,
gsl::span<const std::string> graph_output_names) const;
Source Code Reference
- Repository: microsoft/onnxruntime
- Primary Source: orttraining/orttraining/training_api/module.cc:L660-661
Key Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| inference_model_path | const std::string& | Yes | File system path where the exported ONNX inference model will be saved |
| graph_output_names | gsl::span<const std::string> | Yes | Names of graph outputs to retain in the inference model |
I/O Contract
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | Module state | (internal) | Trained model parameters from CheckpointState |
| Input | inference_model_path | string | Destination path for the exported ONNX model |
| Input | graph_output_names | span<string> | Subset of model outputs to include in the inference model |
| Output | ONNX model file | .onnx file | Self-contained inference model with embedded trained weights |
| Output | Status | common::Status | OK on success, error on failure |
Usage Examples
Basic Export After Distributed Training
#include "orttraining/training_api/module.h"
// After training completes on rank 0...
if (MPIContext::GetInstance().GetWorldRank() == 0) {
std::vector<std::string> output_names = {"logits", "probabilities"};
ORT_THROW_IF_ERROR(module->ExportModelForInferencing(
"/models/trained_model.onnx",
gsl::make_span(output_names)));
}
Export with Single Output
// Export with a single output for classification
std::vector<std::string> output_names = {"predictions"};
ORT_THROW_IF_ERROR(module->ExportModelForInferencing(
"/deploy/classifier.onnx",
gsl::make_span(output_names)));
End-to-End Distributed Training and Export
// 1. Configure and initialize training runner
TrainingRunner::Parameters params;
// ... configure params ...
auto runner = std::make_unique<TrainingRunner>(params, *env);
ORT_THROW_IF_ERROR(runner->Initialize());
// 2. Run distributed training
ORT_THROW_IF_ERROR(runner->Run(
training_data_loader.get(),
test_data_loader.get()));
// 3. Only rank 0 exports the final model
if (MPIContext::GetInstance().GetWorldRank() == 0) {
ORT_THROW_IF_ERROR(runner->EndTraining(test_data_loader.get()));
// Export using the Module API with final checkpoint state
}
Export Process
The ExportModelForInferencing method performs the following steps:
- State validation: Checks that the module does not have a nominal state (parameters must be loaded).
- Eval model availability: Verifies that an evaluation model path or buffer was provided during module construction.
- Model loading: Loads the evaluation model proto from either file or buffer.
- Model cloning: Creates an onnxruntime::Model instance from the loaded proto.
- Weight embedding: Replaces initializer references with the current trained parameter values from the CheckpointState.
- Graph pruning: Removes nodes not needed for the specified output names.
- Model saving: Writes the resulting inference model to the specified path.
Distributed Training Considerations
The key difference from on-device model export:
- Rank 0 export only: In distributed training, typically only rank 0 performs the export after the final gradient synchronization. This avoids redundant I/O from all processes writing the same file.
- State gathering: If pipeline parallelism was used, parameters from all pipeline stages must be gathered to rank 0 before export. The checkpoint system handles this gathering.
- Consistent state: The export should occur after TrainingRunner::EndTraining() which performs a final evaluation and ensures the model state is consistent.
- Shared API: The underlying Module::ExportModelForInferencing implementation is identical to the on-device training version; the difference lies in the distributed orchestration around it.
Key Details
- This method is only available in non-minimal builds (guarded by #if !defined(ORT_MINIMAL_BUILD)).
- The method returns an error if the module has a nominal state (no parameters loaded).
- The method returns an error if no eval model was provided during module construction.
- The graph_output_names parameter allows selecting which model outputs to include, enabling export of models with different output configurations from the same trained model.
- The exported model is a standard ONNX model file that can be loaded by any ONNX Runtime inference session.
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment