Implementation:Microsoft Onnxruntime ExportModelForInferencing
Overview
Transforms the trained evaluation model into a standalone ONNX inference model by embedding trained weights, selecting output nodes, and pruning training-specific graph elements.
Metadata
| Field | Value |
|---|---|
| Implementation Name | ExportModelForInferencing |
| Type | API Doc |
| Language | C++ |
| API | Module::ExportModelForInferencing(const std::string& inference_model_path, gsl::span<const std::string> graph_output_names) const -> Status
|
| Domain | On_Device_Training, Model_Optimization |
| Repository | microsoft/onnxruntime |
| Source Reference | orttraining/orttraining/training_api/module.cc:L660-661 (definition), orttraining/orttraining/training_api/module.h:L145-146 (declaration) |
| Last Updated | 2026-02-10 |
Description
The ExportModelForInferencing method performs several transformations to convert a training evaluation model into an inference-ready ONNX model:
- Model Loading -- The eval model is loaded from the stored path or byte buffer into an
ONNX_NAMESPACE::ModelProto. - Model Cloning -- The eval model is cloned into an
onnxruntime::Modelfor graph manipulation. - Output Transformation --
TransformModelOutputsForInferenceprunes the graph to retain only the specified output nodes, removing any nodes that do not contribute to those outputs. - Input Transformation --
TransformModelInputsForInferenceconverts parameter graph inputs to constant initializers by embedding the current parameter values from theCheckpointState. - Serialization -- The transformed model is saved to the specified path using ONNX protobuf serialization, with external data support for large models.
This method is only available in non-minimal builds (#if !defined(ORT_MINIMAL_BUILD)) because it requires the full ONNX protobuf library for model manipulation.
API Signature
#if !defined(ORT_MINIMAL_BUILD)
Status ExportModelForInferencing(const std::string& inference_model_path,
gsl::span<const std::string> graph_output_names) const;
#endif
Key Parameters
| Parameter | Type | Description |
|---|---|---|
| inference_model_path | const std::string& |
File system path where the inference ONNX model will be saved |
| graph_output_names | gsl::span<const std::string> |
Names of the graph outputs to retain in the inference model |
I/O Contract
| Direction | Type | Description |
|---|---|---|
| Input | Eval model (from Module initialization) | The evaluation ONNX model loaded during Module construction |
| Input | CheckpointState named_parameters |
Current parameter values to embed as constant initializers |
| Input | Output name list | Names specifying which graph outputs to retain |
| Output | .onnx file |
Self-contained inference ONNX model with embedded weights |
Code Reference
From orttraining/orttraining/training_api/module.cc:
Status Module::ExportModelForInferencing(const std::string& inference_model_path,
gsl::span<const std::string> graph_output_names) const {
ORT_RETURN_IF(state_->module_checkpoint_state.is_nominal_state,
"Cannot export the model with a nominal state. Please load the model parameters first.");
ORT_RETURN_IF(!eval_sess_ || (!eval_model_path_.has_value() && !eval_model_buffer_.has_value()),
"Eval model was not provided. Cannot export a model for inferencing.");
ONNX_NAMESPACE::ModelProto eval_model;
if (eval_model_path_.has_value()) {
ORT_THROW_IF_ERROR(Model::Load(ToPathString(eval_model_path_.value()), eval_model));
} else if (eval_model_buffer_.has_value()) {
int eval_model_buffer_size = static_cast<int>(eval_model_buffer_.value().size());
const void* eval_model_buffer_ptr = static_cast<const void*>(eval_model_buffer_.value().data());
ORT_THROW_IF_ERROR(Model::LoadFromBytes(eval_model_buffer_size, eval_model_buffer_ptr, eval_model));
}
// Clone the eval model into an inference onnxruntime::Model
std::shared_ptr<Model> inference_model;
ORT_RETURN_IF_ERROR(Model::Load(eval_model, inference_model, nullptr,
logging::LoggingManager::DefaultLogger()));
// Transform outputs: prune nodes not contributing to specified outputs
ORT_THROW_IF_ERROR(TransformModelOutputsForInference(inference_model->MainGraph(),
graph_output_names));
// Transform inputs: embed parameters as constant initializers
ORT_RETURN_IF_ERROR(TransformModelInputsForInference(
inference_model->MainGraph(),
state_->module_checkpoint_state.named_parameters,
eval_sess_->GetDataTransferManager()));
// ... save model ...
}
From orttraining/orttraining/training_api/module.h:
#if !defined(ORT_MINIMAL_BUILD)
Status ExportModelForInferencing(const std::string& inference_model_path,
gsl::span<const std::string> graph_output_names) const;
#endif
Usage Example
C++
#include "orttraining/training_api/module.h"
// After training is complete
std::vector<std::string> output_names = {"logits", "probabilities"};
Status status = module.ExportModelForInferencing("inference_model.onnx", output_names);
if (!status.IsOK()) {
std::cerr << "Export failed: " << status.ErrorMessage() << std::endl;
}
Python (via wrapper)
from onnxruntime.training.api import CheckpointState, Module
state = CheckpointState.load_checkpoint("checkpoints/final")
module = Module("training_model.onnx", state, "eval_model.onnx", device="cpu")
# Export for inference deployment
module.export_model_for_inferencing(
"inference_model.onnx",
["logits", "probabilities"],
)
Implements
Principle:Microsoft_Onnxruntime_Inference_Model_Export
Related Pages
- Module TrainStep -- Produces the trained parameters used for export
- SaveCheckpoint -- Alternative for persisting training state
- Torch Onnx Export -- The initial export step at the start of the pipeline