Implementation:Microsoft Onnxruntime Memory Opt Env Config

Overview

Configures ONNX Runtime's memory optimization through environment variables that control activation recomputation strategies for reducing GPU memory consumption during ORTModule training.

Metadata

Field	Value
Implementation Name	Memory_Opt_Env_Config
Type	Pattern Doc
Language	Python (env var configuration)
API	1\|2, `ORTMODULE_MEMORY_OPT_CONFIG=path_to_config.json`
Domain	Accelerated_Training, PyTorch_Integration
Repository	microsoft/onnxruntime
Source Reference	docs/Memory_Optimizer.md:L33-34 (level), L81-96 (config)
Last Updated	2026-02-10

Description

Memory optimization in ONNX Runtime Training is configured entirely through environment variables. The memory optimizer is implemented as a graph transformer that scans the ONNX execution graph to identify re-computable subgraph candidates and applies recomputation based on the configured optimization level and optional configuration file.

Environment Variables

Variable	Values	Description
`ORTMODULE_MEMORY_OPT_LEVEL`	`0` (default), `1`, `2`	Optimization level: 0=disabled/user-selected, 1=transformer layerwise recompute, 2=compromised recompute
`ORTMODULE_MEMORY_OPT_CONFIG`	File path	Path to a JSON configuration file specifying which subgraphs to recompute (only used with level 0)

Configuration File Format

The configuration file is a JSON array of strings, each specifying a subgraph recompute plan:

[
    "<cluster_id>:<strategy>:<count>",
    "BiasGelu+:1:1",
    "Dropout+:1:-1"
]

Where:

cluster_id -- String representative of the re-computable subgraph (e.g., BiasGelu+, BiasSoftmax+)
strategy -- 0=none, 1=recompute, 2=compromised recompute
count -- Number of occurrences to apply: positive integer for specific count, -1 for all occurrences

API Signature

# Mode 1: Transformer Layerwise Recompute (simple)
export ORTMODULE_MEMORY_OPT_LEVEL=1

# Mode 2: User-Selected Subgraph Recompute (advanced)
export ORTMODULE_MEMORY_OPT_LEVEL=0
export ORTMODULE_MEMORY_OPT_CONFIG="mem_opt.json"

# Mode 3: Compromised Recompute (aggressive)
export ORTMODULE_MEMORY_OPT_LEVEL=2

Key Parameters

Parameter	Type	Description
ORTMODULE_MEMORY_OPT_LEVEL	`int` (env var)	Controls the aggressiveness of memory optimization
ORTMODULE_MEMORY_OPT_CONFIG	`str` (env var, file path)	Path to JSON config file for user-selected subgraph recompute

I/O Contract

Direction	Type	Description
Input	Environment variables	Optimization level and optional config file path
Input (optional)	JSON config file	List of subgraph recompute plans
Effect	Memory reduction	Reduces GPU peak memory by recomputing activations during backward pass
Trade-off	Increased compute	Additional forward computation for recomputed activations

Code Reference

From docs/Memory_Optimizer.md:

There are two modes to enable the memory optimizations:
- Transformer layerwise recompute, e.g. aggressively recompute all supported nodes
  within each transformer layer (usually including attention and mlp sublayers),
  enabled by `export ORTMODULE_MEMORY_OPT_LEVEL=1`.
  In this mode, `ORTMODULE_MEMORY_OPT_CONFIG` env values passed by users are not respected.

- Manual selected subgraph recompute, enabled by
  `export ORTMODULE_MEMORY_OPT_LEVEL=0` and
  `export ORTMODULE_MEMORY_OPT_CONFIG=<config file path>`.

When enabled, the optimizer logs available plans with their memory savings:

Memory Optimizer     :  ON   :  Memory Optimization Level: [TRANSFORMER_LAYERWISE_RECOMPUTE]
                                Configs                                  Freq  Max Saving(Bytes)
- Plan 1            :  ON   :  Reshape+Where+:1:-1                      1     134,217,728
- Plan 2            :  ON   :  BiasSoftmax+:1:-1                        1     134,086,656
- Plan 3            :  ON   :  Cast+:1:-1                               1     67,043,328
- Plan 4            :  ON   :  BiasGelu+:1:-1                           1     20,951,040

Usage Example

Simple Mode (Transformer Layerwise Recompute)

# Enable transformer layerwise recompute
export ORTMODULE_MEMORY_OPT_LEVEL=1

from onnxruntime.training.ortmodule import ORTModule

model = build_model()
model = ORTModule(model)

# Training loop -- memory optimization is automatically applied
for batch in dataloader:
    loss = model(batch)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

Advanced Mode (User-Selected Subgraph Recompute)

# Enable user-selected mode
export ORTMODULE_MEMORY_OPT_LEVEL=0
export ORTMODULE_MEMORY_OPT_CONFIG="mem_opt.json"

Contents of mem_opt.json:

[
    "BiasGelu+:1:1",
    "Dropout+:1:-1"
]

Discovery Workflow

# Step 1: Run with level 0 (disabled) to discover available plans
export ORTMODULE_MEMORY_OPT_LEVEL=0
# Run training for a few steps, check logs for available plans

# Step 2: Select plans and create config
echo '["BiasGelu+:1:-1", "BiasSoftmax+:1:-1"]' > mem_opt.json
export ORTMODULE_MEMORY_OPT_CONFIG="mem_opt.json"

# Step 3: Re-run training with optimized memory

Implements

Principle:Microsoft_Onnxruntime_Memory_Optimization

Related Pages

ORTModule Wrap -- ORTModule must be active for memory optimization to take effect
ORTModule Training Execution -- The training loop where memory optimization is applied
FusedAdam FP16Optimizer -- Complementary optimization for parameter updates
Environment:Microsoft_Onnxruntime_CUDA_GPU_Environment
Heuristic:Microsoft_Onnxruntime_Memory_Recomputation_Optimization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment