Heuristic:Microsoft Onnxruntime Graph Optimization Level Selection
| Field | Value |
|---|---|
| Sources | onnxruntime/__init__.py (GraphOptimizationLevel import), ONNX Runtime API documentation
|
| Domains | Inference, Graph Optimization, Performance Tuning |
| Last Updated | 2026-02-10 |
Overview
Select the appropriate graph optimization level to balance between inference performance and debuggability when creating an ONNX Runtime session.
Description
ONNX Runtime applies a series of graph-level transformations to the ONNX model before execution. These optimizations are organized into progressive levels, each building on the previous. The GraphOptimizationLevel enum controls which tiers of optimization are applied. Choosing the right level depends on whether you are debugging model behavior, validating numerical correctness, or running in production. The levels are cumulative: each higher level includes all optimizations from lower levels plus additional transformations.
The four levels are:
- ORT_DISABLE_ALL -- No graph optimizations are applied. The model executes exactly as exported.
- ORT_ENABLE_BASIC -- Applies semantics-preserving optimizations such as constant folding, redundant node elimination, and dead code removal.
- ORT_ENABLE_EXTENDED -- Adds operator fusion optimizations on top of basic. This includes fusions such as MatMul+Add, GELU fusion, Attention fusion, and other composite operator patterns that reduce kernel launch overhead and improve memory access patterns.
- ORT_ENABLE_ALL -- Includes all extended optimizations plus hardware-specific layout transformations (e.g., NCHWc layout optimizations for CPU). This is the default and recommended level for production.
Usage
Use this heuristic when:
- Setting up an ONNX Runtime inference session and deciding which optimization level to apply.
- Debugging numerical issues -- use
ORT_DISABLE_ALLto isolate whether discrepancies originate from graph transformations or from the operators themselves. - Validating operator-level correctness -- use
ORT_ENABLE_BASICto apply only safe, semantics-preserving transformations. - Production deployment -- use
ORT_ENABLE_ALL(the default) for maximum performance.
The Insight (Rule of Thumb)
Use ORT_ENABLE_ALL (the default) for production inference to maximize performance. When debugging unexpected model outputs or numerical discrepancies, temporarily switch to ORT_DISABLE_ALL to determine if graph optimizations are the source of the issue. If disabling all optimizations resolves the problem, progressively re-enable levels (BASIC, then EXTENDED, then ALL) to isolate which optimization tier introduces the discrepancy.
Configuration:
import onnxruntime as ort
sess_options = ort.SessionOptions()
# Production (default):
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
# Debugging:
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
session = ort.InferenceSession("model.onnx", sess_options)
What each level adds:
| Level | Optimizations Included |
|---|---|
ORT_DISABLE_ALL |
None; model runs as-is |
ORT_ENABLE_BASIC |
Constant folding, redundant node elimination, dead code removal |
ORT_ENABLE_EXTENDED |
Basic + operator fusions (MatMul+Add, GELU, Attention, etc.) |
ORT_ENABLE_ALL |
Extended + layout optimizations (NCHWc for CPU) |
Reasoning
Graph-level optimizations can provide substantial inference speedups by reducing redundant computation, fusing operators to minimize kernel launch overhead, and optimizing memory layouts for the target hardware. However, these transformations modify the computational graph and can, in rare cases, introduce numerical differences. The progressive level design allows users to apply optimizations incrementally: basic optimizations are safe and semantics-preserving, extended optimizations involve operator fusion patterns that are validated but may alter floating-point accumulation order, and the full level adds hardware-specific layout changes. For debugging, disabling all optimizations provides a clean baseline that executes the original ONNX graph as authored. The optimized ONNX model can be saved to disk via sess_options.optimized_model_filepath for inspection, which is useful when diagnosing which fusions were applied.