Implementation:Microsoft Onnxruntime ExportModelForInferencing Distributed

Field	Value
Implementation Name	ExportModelForInferencing_Distributed
Overview	Export of a trained ONNX model for inference deployment from the distributed training pipeline.
Type	API Doc
Language	C++
Domains	Distributed_Training, Training_Infrastructure
Source Repository	microsoft/onnxruntime
Last Updated	2026-02-10

Overview

Export of a trained ONNX model for inference deployment from the distributed training pipeline. Module::ExportModelForInferencing transforms the evaluation graph by embedding trained weights and removing training-specific nodes, producing a clean ONNX model suitable for inference.

API

Status Module::ExportModelForInferencing(
    const std::string& inference_model_path,
    gsl::span<const std::string> graph_output_names) const;

Source Code Reference

Repository: microsoft/onnxruntime
Primary Source: orttraining/orttraining/training_api/module.cc:L660-661

Key Parameters

Parameter	Type	Required	Description
inference_model_path	const std::string&	Yes	File system path where the exported ONNX inference model will be saved
graph_output_names	gsl::span<const std::string>	Yes	Names of graph outputs to retain in the inference model

I/O Contract

Direction	Name	Type	Description
Input	Module state	(internal)	Trained model parameters from CheckpointState
Input	inference_model_path	string	Destination path for the exported ONNX model
Input	graph_output_names	span<string>	Subset of model outputs to include in the inference model
Output	ONNX model file	.onnx file	Self-contained inference model with embedded trained weights
Output	Status	common::Status	OK on success, error on failure

Usage Examples

Basic Export After Distributed Training

#include "orttraining/training_api/module.h"

// After training completes on rank 0...
if (MPIContext::GetInstance().GetWorldRank() == 0) {
    std::vector<std::string> output_names = {"logits", "probabilities"};

    ORT_THROW_IF_ERROR(module->ExportModelForInferencing(
        "/models/trained_model.onnx",
        gsl::make_span(output_names)));
}

Export with Single Output

// Export with a single output for classification
std::vector<std::string> output_names = {"predictions"};

ORT_THROW_IF_ERROR(module->ExportModelForInferencing(
    "/deploy/classifier.onnx",
    gsl::make_span(output_names)));

End-to-End Distributed Training and Export

// 1. Configure and initialize training runner
TrainingRunner::Parameters params;
// ... configure params ...
auto runner = std::make_unique<TrainingRunner>(params, *env);
ORT_THROW_IF_ERROR(runner->Initialize());

// 2. Run distributed training
ORT_THROW_IF_ERROR(runner->Run(
    training_data_loader.get(),
    test_data_loader.get()));

// 3. Only rank 0 exports the final model
if (MPIContext::GetInstance().GetWorldRank() == 0) {
    ORT_THROW_IF_ERROR(runner->EndTraining(test_data_loader.get()));
    // Export using the Module API with final checkpoint state
}

Export Process

The ExportModelForInferencing method performs the following steps:

State validation: Checks that the module does not have a nominal state (parameters must be loaded).
Eval model availability: Verifies that an evaluation model path or buffer was provided during module construction.
Model loading: Loads the evaluation model proto from either file or buffer.
Model cloning: Creates an onnxruntime::Model instance from the loaded proto.
Weight embedding: Replaces initializer references with the current trained parameter values from the CheckpointState.
Graph pruning: Removes nodes not needed for the specified output names.
Model saving: Writes the resulting inference model to the specified path.

Distributed Training Considerations

The key difference from on-device model export:

Rank 0 export only: In distributed training, typically only rank 0 performs the export after the final gradient synchronization. This avoids redundant I/O from all processes writing the same file.
State gathering: If pipeline parallelism was used, parameters from all pipeline stages must be gathered to rank 0 before export. The checkpoint system handles this gathering.
Consistent state: The export should occur after TrainingRunner::EndTraining() which performs a final evaluation and ensures the model state is consistent.
Shared API: The underlying Module::ExportModelForInferencing implementation is identical to the on-device training version; the difference lies in the distributed orchestration around it.

Key Details

This method is only available in non-minimal builds (guarded by #if !defined(ORT_MINIMAL_BUILD)).
The method returns an error if the module has a nominal state (no parameters loaded).
The method returns an error if no eval model was provided during module construction.
The graph_output_names parameter allows selecting which model outputs to include, enabling export of models with different output configurations from the same trained model.
The exported model is a standard ONNX model file that can be loaded by any ONNX Runtime inference session.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment