Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Microsoft Onnxruntime Memory Opt Env Config

From Leeroopedia


Overview

Configures ONNX Runtime's memory optimization through environment variables that control activation recomputation strategies for reducing GPU memory consumption during ORTModule training.

Metadata

Field Value
Implementation Name Memory_Opt_Env_Config
Type Pattern Doc
Language Python (env var configuration)
API 1|2, ORTMODULE_MEMORY_OPT_CONFIG=path_to_config.json
Domain Accelerated_Training, PyTorch_Integration
Repository microsoft/onnxruntime
Source Reference docs/Memory_Optimizer.md:L33-34 (level), L81-96 (config)
Last Updated 2026-02-10

Description

Memory optimization in ONNX Runtime Training is configured entirely through environment variables. The memory optimizer is implemented as a graph transformer that scans the ONNX execution graph to identify re-computable subgraph candidates and applies recomputation based on the configured optimization level and optional configuration file.

Environment Variables

Variable Values Description
ORTMODULE_MEMORY_OPT_LEVEL 0 (default), 1, 2 Optimization level: 0=disabled/user-selected, 1=transformer layerwise recompute, 2=compromised recompute
ORTMODULE_MEMORY_OPT_CONFIG File path Path to a JSON configuration file specifying which subgraphs to recompute (only used with level 0)

Configuration File Format

The configuration file is a JSON array of strings, each specifying a subgraph recompute plan:

[
    "<cluster_id>:<strategy>:<count>",
    "BiasGelu+:1:1",
    "Dropout+:1:-1"
]

Where:

  • cluster_id -- String representative of the re-computable subgraph (e.g., BiasGelu+, BiasSoftmax+)
  • strategy -- 0=none, 1=recompute, 2=compromised recompute
  • count -- Number of occurrences to apply: positive integer for specific count, -1 for all occurrences

API Signature

# Mode 1: Transformer Layerwise Recompute (simple)
export ORTMODULE_MEMORY_OPT_LEVEL=1

# Mode 2: User-Selected Subgraph Recompute (advanced)
export ORTMODULE_MEMORY_OPT_LEVEL=0
export ORTMODULE_MEMORY_OPT_CONFIG="mem_opt.json"

# Mode 3: Compromised Recompute (aggressive)
export ORTMODULE_MEMORY_OPT_LEVEL=2

Key Parameters

Parameter Type Description
ORTMODULE_MEMORY_OPT_LEVEL int (env var) Controls the aggressiveness of memory optimization
ORTMODULE_MEMORY_OPT_CONFIG str (env var, file path) Path to JSON config file for user-selected subgraph recompute

I/O Contract

Direction Type Description
Input Environment variables Optimization level and optional config file path
Input (optional) JSON config file List of subgraph recompute plans
Effect Memory reduction Reduces GPU peak memory by recomputing activations during backward pass
Trade-off Increased compute Additional forward computation for recomputed activations

Code Reference

From docs/Memory_Optimizer.md:

There are two modes to enable the memory optimizations:
- Transformer layerwise recompute, e.g. aggressively recompute all supported nodes
  within each transformer layer (usually including attention and mlp sublayers),
  enabled by `export ORTMODULE_MEMORY_OPT_LEVEL=1`.
  In this mode, `ORTMODULE_MEMORY_OPT_CONFIG` env values passed by users are not respected.

- Manual selected subgraph recompute, enabled by
  `export ORTMODULE_MEMORY_OPT_LEVEL=0` and
  `export ORTMODULE_MEMORY_OPT_CONFIG=<config file path>`.

When enabled, the optimizer logs available plans with their memory savings:

Memory Optimizer     :  ON   :  Memory Optimization Level: [TRANSFORMER_LAYERWISE_RECOMPUTE]
                                Configs                                  Freq  Max Saving(Bytes)
- Plan 1            :  ON   :  Reshape+Where+:1:-1                      1     134,217,728
- Plan 2            :  ON   :  BiasSoftmax+:1:-1                        1     134,086,656
- Plan 3            :  ON   :  Cast+:1:-1                               1     67,043,328
- Plan 4            :  ON   :  BiasGelu+:1:-1                           1     20,951,040

Usage Example

Simple Mode (Transformer Layerwise Recompute)

# Enable transformer layerwise recompute
export ORTMODULE_MEMORY_OPT_LEVEL=1
from onnxruntime.training.ortmodule import ORTModule

model = build_model()
model = ORTModule(model)

# Training loop -- memory optimization is automatically applied
for batch in dataloader:
    loss = model(batch)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

Advanced Mode (User-Selected Subgraph Recompute)

# Enable user-selected mode
export ORTMODULE_MEMORY_OPT_LEVEL=0
export ORTMODULE_MEMORY_OPT_CONFIG="mem_opt.json"

Contents of mem_opt.json:

[
    "BiasGelu+:1:1",
    "Dropout+:1:-1"
]

Discovery Workflow

# Step 1: Run with level 0 (disabled) to discover available plans
export ORTMODULE_MEMORY_OPT_LEVEL=0
# Run training for a few steps, check logs for available plans

# Step 2: Select plans and create config
echo '["BiasGelu+:1:-1", "BiasSoftmax+:1:-1"]' > mem_opt.json
export ORTMODULE_MEMORY_OPT_CONFIG="mem_opt.json"

# Step 3: Re-run training with optimized memory

Implements

Principle:Microsoft_Onnxruntime_Memory_Optimization

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment