Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft Onnxruntime ExportModelForInferencing Distributed

From Leeroopedia


Field Value
Implementation Name ExportModelForInferencing_Distributed
Overview Export of a trained ONNX model for inference deployment from the distributed training pipeline.
Type API Doc
Language C++
Domains Distributed_Training, Training_Infrastructure
Source Repository microsoft/onnxruntime
Last Updated 2026-02-10

Overview

Export of a trained ONNX model for inference deployment from the distributed training pipeline. Module::ExportModelForInferencing transforms the evaluation graph by embedding trained weights and removing training-specific nodes, producing a clean ONNX model suitable for inference.

API

Status Module::ExportModelForInferencing(
    const std::string& inference_model_path,
    gsl::span<const std::string> graph_output_names) const;

Source Code Reference

Key Parameters

Parameter Type Required Description
inference_model_path const std::string& Yes File system path where the exported ONNX inference model will be saved
graph_output_names gsl::span<const std::string> Yes Names of graph outputs to retain in the inference model

I/O Contract

Direction Name Type Description
Input Module state (internal) Trained model parameters from CheckpointState
Input inference_model_path string Destination path for the exported ONNX model
Input graph_output_names span<string> Subset of model outputs to include in the inference model
Output ONNX model file .onnx file Self-contained inference model with embedded trained weights
Output Status common::Status OK on success, error on failure

Usage Examples

Basic Export After Distributed Training

#include "orttraining/training_api/module.h"

// After training completes on rank 0...
if (MPIContext::GetInstance().GetWorldRank() == 0) {
    std::vector<std::string> output_names = {"logits", "probabilities"};

    ORT_THROW_IF_ERROR(module->ExportModelForInferencing(
        "/models/trained_model.onnx",
        gsl::make_span(output_names)));
}

Export with Single Output

// Export with a single output for classification
std::vector<std::string> output_names = {"predictions"};

ORT_THROW_IF_ERROR(module->ExportModelForInferencing(
    "/deploy/classifier.onnx",
    gsl::make_span(output_names)));

End-to-End Distributed Training and Export

// 1. Configure and initialize training runner
TrainingRunner::Parameters params;
// ... configure params ...
auto runner = std::make_unique<TrainingRunner>(params, *env);
ORT_THROW_IF_ERROR(runner->Initialize());

// 2. Run distributed training
ORT_THROW_IF_ERROR(runner->Run(
    training_data_loader.get(),
    test_data_loader.get()));

// 3. Only rank 0 exports the final model
if (MPIContext::GetInstance().GetWorldRank() == 0) {
    ORT_THROW_IF_ERROR(runner->EndTraining(test_data_loader.get()));
    // Export using the Module API with final checkpoint state
}

Export Process

The ExportModelForInferencing method performs the following steps:

  1. State validation: Checks that the module does not have a nominal state (parameters must be loaded).
  2. Eval model availability: Verifies that an evaluation model path or buffer was provided during module construction.
  3. Model loading: Loads the evaluation model proto from either file or buffer.
  4. Model cloning: Creates an onnxruntime::Model instance from the loaded proto.
  5. Weight embedding: Replaces initializer references with the current trained parameter values from the CheckpointState.
  6. Graph pruning: Removes nodes not needed for the specified output names.
  7. Model saving: Writes the resulting inference model to the specified path.

Distributed Training Considerations

The key difference from on-device model export:

  • Rank 0 export only: In distributed training, typically only rank 0 performs the export after the final gradient synchronization. This avoids redundant I/O from all processes writing the same file.
  • State gathering: If pipeline parallelism was used, parameters from all pipeline stages must be gathered to rank 0 before export. The checkpoint system handles this gathering.
  • Consistent state: The export should occur after TrainingRunner::EndTraining() which performs a final evaluation and ensures the model state is consistent.
  • Shared API: The underlying Module::ExportModelForInferencing implementation is identical to the on-device training version; the difference lies in the distributed orchestration around it.

Key Details

  • This method is only available in non-minimal builds (guarded by #if !defined(ORT_MINIMAL_BUILD)).
  • The method returns an error if the module has a nominal state (no parameters loaded).
  • The method returns an error if no eval model was provided during module construction.
  • The graph_output_names parameter allows selecting which model outputs to include, enabling export of models with different output configurations from the same trained model.
  • The exported model is a standard ONNX model file that can be loaded by any ONNX Runtime inference session.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment