Implementation:Microsoft Onnxruntime ExportModelForInferencing

Overview

Transforms the trained evaluation model into a standalone ONNX inference model by embedding trained weights, selecting output nodes, and pruning training-specific graph elements.

Metadata

Field	Value
Implementation Name	ExportModelForInferencing
Type	API Doc
Language	C++
API	`Module::ExportModelForInferencing(const std::string& inference_model_path, gsl::span<const std::string> graph_output_names) const -> Status`
Domain	On_Device_Training, Model_Optimization
Repository	microsoft/onnxruntime
Source Reference	orttraining/orttraining/training_api/module.cc:L660-661 (definition), orttraining/orttraining/training_api/module.h:L145-146 (declaration)
Last Updated	2026-02-10

Description

The ExportModelForInferencing method performs several transformations to convert a training evaluation model into an inference-ready ONNX model:

Model Loading -- The eval model is loaded from the stored path or byte buffer into an ONNX_NAMESPACE::ModelProto.
Model Cloning -- The eval model is cloned into an onnxruntime::Model for graph manipulation.
Output Transformation -- TransformModelOutputsForInference prunes the graph to retain only the specified output nodes, removing any nodes that do not contribute to those outputs.
Input Transformation -- TransformModelInputsForInference converts parameter graph inputs to constant initializers by embedding the current parameter values from the CheckpointState.
Serialization -- The transformed model is saved to the specified path using ONNX protobuf serialization, with external data support for large models.

This method is only available in non-minimal builds (#if !defined(ORT_MINIMAL_BUILD)) because it requires the full ONNX protobuf library for model manipulation.

API Signature

#if !defined(ORT_MINIMAL_BUILD)
Status ExportModelForInferencing(const std::string& inference_model_path,
                                  gsl::span<const std::string> graph_output_names) const;
#endif

Key Parameters

Parameter	Type	Description
inference_model_path	`const std::string&`	File system path where the inference ONNX model will be saved
graph_output_names	`gsl::span<const std::string>`	Names of the graph outputs to retain in the inference model

I/O Contract

Direction	Type	Description
Input	Eval model (from Module initialization)	The evaluation ONNX model loaded during Module construction
Input	`CheckpointState` named_parameters	Current parameter values to embed as constant initializers
Input	Output name list	Names specifying which graph outputs to retain
Output	`.onnx` file	Self-contained inference ONNX model with embedded weights

Code Reference

From orttraining/orttraining/training_api/module.cc:

Status Module::ExportModelForInferencing(const std::string& inference_model_path,
                                         gsl::span<const std::string> graph_output_names) const {
  ORT_RETURN_IF(state_->module_checkpoint_state.is_nominal_state,
                "Cannot export the model with a nominal state. Please load the model parameters first.");
  ORT_RETURN_IF(!eval_sess_ || (!eval_model_path_.has_value() && !eval_model_buffer_.has_value()),
                "Eval model was not provided. Cannot export a model for inferencing.");

  ONNX_NAMESPACE::ModelProto eval_model;
  if (eval_model_path_.has_value()) {
    ORT_THROW_IF_ERROR(Model::Load(ToPathString(eval_model_path_.value()), eval_model));
  } else if (eval_model_buffer_.has_value()) {
    int eval_model_buffer_size = static_cast<int>(eval_model_buffer_.value().size());
    const void* eval_model_buffer_ptr = static_cast<const void*>(eval_model_buffer_.value().data());
    ORT_THROW_IF_ERROR(Model::LoadFromBytes(eval_model_buffer_size, eval_model_buffer_ptr, eval_model));
  }

  // Clone the eval model into an inference onnxruntime::Model
  std::shared_ptr<Model> inference_model;
  ORT_RETURN_IF_ERROR(Model::Load(eval_model, inference_model, nullptr,
                                   logging::LoggingManager::DefaultLogger()));

  // Transform outputs: prune nodes not contributing to specified outputs
  ORT_THROW_IF_ERROR(TransformModelOutputsForInference(inference_model->MainGraph(),
                                                        graph_output_names));

  // Transform inputs: embed parameters as constant initializers
  ORT_RETURN_IF_ERROR(TransformModelInputsForInference(
      inference_model->MainGraph(),
      state_->module_checkpoint_state.named_parameters,
      eval_sess_->GetDataTransferManager()));
  // ... save model ...
}

From orttraining/orttraining/training_api/module.h:

#if !defined(ORT_MINIMAL_BUILD)
  Status ExportModelForInferencing(const std::string& inference_model_path,
                                   gsl::span<const std::string> graph_output_names) const;
#endif

Usage Example

C++

#include "orttraining/training_api/module.h"

// After training is complete
std::vector<std::string> output_names = {"logits", "probabilities"};
Status status = module.ExportModelForInferencing("inference_model.onnx", output_names);
if (!status.IsOK()) {
    std::cerr << "Export failed: " << status.ErrorMessage() << std::endl;
}

Python (via wrapper)

from onnxruntime.training.api import CheckpointState, Module

state = CheckpointState.load_checkpoint("checkpoints/final")
module = Module("training_model.onnx", state, "eval_model.onnx", device="cpu")

# Export for inference deployment
module.export_model_for_inferencing(
    "inference_model.onnx",
    ["logits", "probabilities"],
)

Implements

Principle:Microsoft_Onnxruntime_Inference_Model_Export

Related Pages

Module TrainStep -- Produces the trained parameters used for export
SaveCheckpoint -- Alternative for persisting training state
Torch Onnx Export -- The initial export step at the start of the pipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment