Implementation:Microsoft Onnxruntime Memory Opt Env Config
Overview
Configures ONNX Runtime's memory optimization through environment variables that control activation recomputation strategies for reducing GPU memory consumption during ORTModule training.
Metadata
| Field | Value |
|---|---|
| Implementation Name | Memory_Opt_Env_Config |
| Type | Pattern Doc |
| Language | Python (env var configuration) |
| API | 1|2, ORTMODULE_MEMORY_OPT_CONFIG=path_to_config.json
|
| Domain | Accelerated_Training, PyTorch_Integration |
| Repository | microsoft/onnxruntime |
| Source Reference | docs/Memory_Optimizer.md:L33-34 (level), L81-96 (config) |
| Last Updated | 2026-02-10 |
Description
Memory optimization in ONNX Runtime Training is configured entirely through environment variables. The memory optimizer is implemented as a graph transformer that scans the ONNX execution graph to identify re-computable subgraph candidates and applies recomputation based on the configured optimization level and optional configuration file.
Environment Variables
| Variable | Values | Description |
|---|---|---|
ORTMODULE_MEMORY_OPT_LEVEL |
0 (default), 1, 2 |
Optimization level: 0=disabled/user-selected, 1=transformer layerwise recompute, 2=compromised recompute |
ORTMODULE_MEMORY_OPT_CONFIG |
File path | Path to a JSON configuration file specifying which subgraphs to recompute (only used with level 0) |
Configuration File Format
The configuration file is a JSON array of strings, each specifying a subgraph recompute plan:
[
"<cluster_id>:<strategy>:<count>",
"BiasGelu+:1:1",
"Dropout+:1:-1"
]
Where:
- cluster_id -- String representative of the re-computable subgraph (e.g.,
BiasGelu+,BiasSoftmax+) - strategy --
0=none,1=recompute,2=compromised recompute - count -- Number of occurrences to apply: positive integer for specific count,
-1for all occurrences
API Signature
# Mode 1: Transformer Layerwise Recompute (simple)
export ORTMODULE_MEMORY_OPT_LEVEL=1
# Mode 2: User-Selected Subgraph Recompute (advanced)
export ORTMODULE_MEMORY_OPT_LEVEL=0
export ORTMODULE_MEMORY_OPT_CONFIG="mem_opt.json"
# Mode 3: Compromised Recompute (aggressive)
export ORTMODULE_MEMORY_OPT_LEVEL=2
Key Parameters
| Parameter | Type | Description |
|---|---|---|
| ORTMODULE_MEMORY_OPT_LEVEL | int (env var) |
Controls the aggressiveness of memory optimization |
| ORTMODULE_MEMORY_OPT_CONFIG | str (env var, file path) |
Path to JSON config file for user-selected subgraph recompute |
I/O Contract
| Direction | Type | Description |
|---|---|---|
| Input | Environment variables | Optimization level and optional config file path |
| Input (optional) | JSON config file | List of subgraph recompute plans |
| Effect | Memory reduction | Reduces GPU peak memory by recomputing activations during backward pass |
| Trade-off | Increased compute | Additional forward computation for recomputed activations |
Code Reference
From docs/Memory_Optimizer.md:
There are two modes to enable the memory optimizations:
- Transformer layerwise recompute, e.g. aggressively recompute all supported nodes
within each transformer layer (usually including attention and mlp sublayers),
enabled by `export ORTMODULE_MEMORY_OPT_LEVEL=1`.
In this mode, `ORTMODULE_MEMORY_OPT_CONFIG` env values passed by users are not respected.
- Manual selected subgraph recompute, enabled by
`export ORTMODULE_MEMORY_OPT_LEVEL=0` and
`export ORTMODULE_MEMORY_OPT_CONFIG=<config file path>`.
When enabled, the optimizer logs available plans with their memory savings:
Memory Optimizer : ON : Memory Optimization Level: [TRANSFORMER_LAYERWISE_RECOMPUTE]
Configs Freq Max Saving(Bytes)
- Plan 1 : ON : Reshape+Where+:1:-1 1 134,217,728
- Plan 2 : ON : BiasSoftmax+:1:-1 1 134,086,656
- Plan 3 : ON : Cast+:1:-1 1 67,043,328
- Plan 4 : ON : BiasGelu+:1:-1 1 20,951,040
Usage Example
Simple Mode (Transformer Layerwise Recompute)
# Enable transformer layerwise recompute
export ORTMODULE_MEMORY_OPT_LEVEL=1
from onnxruntime.training.ortmodule import ORTModule
model = build_model()
model = ORTModule(model)
# Training loop -- memory optimization is automatically applied
for batch in dataloader:
loss = model(batch)
loss.backward()
optimizer.step()
optimizer.zero_grad()
Advanced Mode (User-Selected Subgraph Recompute)
# Enable user-selected mode
export ORTMODULE_MEMORY_OPT_LEVEL=0
export ORTMODULE_MEMORY_OPT_CONFIG="mem_opt.json"
Contents of mem_opt.json:
[
"BiasGelu+:1:1",
"Dropout+:1:-1"
]
Discovery Workflow
# Step 1: Run with level 0 (disabled) to discover available plans
export ORTMODULE_MEMORY_OPT_LEVEL=0
# Run training for a few steps, check logs for available plans
# Step 2: Select plans and create config
echo '["BiasGelu+:1:-1", "BiasSoftmax+:1:-1"]' > mem_opt.json
export ORTMODULE_MEMORY_OPT_CONFIG="mem_opt.json"
# Step 3: Re-run training with optimized memory
Implements
Principle:Microsoft_Onnxruntime_Memory_Optimization
Related Pages
- ORTModule Wrap -- ORTModule must be active for memory optimization to take effect
- ORTModule Training Execution -- The training loop where memory optimization is applied
- FusedAdam FP16Optimizer -- Complementary optimization for parameter updates
- Environment:Microsoft_Onnxruntime_CUDA_GPU_Environment
- Heuristic:Microsoft_Onnxruntime_Memory_Recomputation_Optimization