Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Microsoft Onnxruntime Graph Optimization Level Selection

From Leeroopedia



Field Value
Sources onnxruntime/__init__.py (GraphOptimizationLevel import), ONNX Runtime API documentation
Domains Inference, Graph Optimization, Performance Tuning
Last Updated 2026-02-10

Overview

Select the appropriate graph optimization level to balance between inference performance and debuggability when creating an ONNX Runtime session.

Description

ONNX Runtime applies a series of graph-level transformations to the ONNX model before execution. These optimizations are organized into progressive levels, each building on the previous. The GraphOptimizationLevel enum controls which tiers of optimization are applied. Choosing the right level depends on whether you are debugging model behavior, validating numerical correctness, or running in production. The levels are cumulative: each higher level includes all optimizations from lower levels plus additional transformations.

The four levels are:

  • ORT_DISABLE_ALL -- No graph optimizations are applied. The model executes exactly as exported.
  • ORT_ENABLE_BASIC -- Applies semantics-preserving optimizations such as constant folding, redundant node elimination, and dead code removal.
  • ORT_ENABLE_EXTENDED -- Adds operator fusion optimizations on top of basic. This includes fusions such as MatMul+Add, GELU fusion, Attention fusion, and other composite operator patterns that reduce kernel launch overhead and improve memory access patterns.
  • ORT_ENABLE_ALL -- Includes all extended optimizations plus hardware-specific layout transformations (e.g., NCHWc layout optimizations for CPU). This is the default and recommended level for production.

Usage

Use this heuristic when:

  • Setting up an ONNX Runtime inference session and deciding which optimization level to apply.
  • Debugging numerical issues -- use ORT_DISABLE_ALL to isolate whether discrepancies originate from graph transformations or from the operators themselves.
  • Validating operator-level correctness -- use ORT_ENABLE_BASIC to apply only safe, semantics-preserving transformations.
  • Production deployment -- use ORT_ENABLE_ALL (the default) for maximum performance.

The Insight (Rule of Thumb)

Use ORT_ENABLE_ALL (the default) for production inference to maximize performance. When debugging unexpected model outputs or numerical discrepancies, temporarily switch to ORT_DISABLE_ALL to determine if graph optimizations are the source of the issue. If disabling all optimizations resolves the problem, progressively re-enable levels (BASIC, then EXTENDED, then ALL) to isolate which optimization tier introduces the discrepancy.

Configuration:

import onnxruntime as ort

sess_options = ort.SessionOptions()

# Production (default):
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL

# Debugging:
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL

session = ort.InferenceSession("model.onnx", sess_options)

What each level adds:

Level Optimizations Included
ORT_DISABLE_ALL None; model runs as-is
ORT_ENABLE_BASIC Constant folding, redundant node elimination, dead code removal
ORT_ENABLE_EXTENDED Basic + operator fusions (MatMul+Add, GELU, Attention, etc.)
ORT_ENABLE_ALL Extended + layout optimizations (NCHWc for CPU)

Reasoning

Graph-level optimizations can provide substantial inference speedups by reducing redundant computation, fusing operators to minimize kernel launch overhead, and optimizing memory layouts for the target hardware. However, these transformations modify the computational graph and can, in rare cases, introduce numerical differences. The progressive level design allows users to apply optimizations incrementally: basic optimizations are safe and semantics-preserving, extended optimizations involve operator fusion patterns that are validated but may alter floating-point accumulation order, and the full level adds hardware-specific layout changes. For debugging, disabling all optimizations provides a clean baseline that executes the original ONNX graph as authored. The optimized ONNX model can be saved to disk via sess_options.optimized_model_filepath for inspection, which is useful when diagnosing which fusions were applied.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment