Principle:Microsoft Onnxruntime Session Options Configuration

Metadata

Field	Value
Principle Name	Session_Options_Configuration
Repository	Microsoft_Onnxruntime
Source Repository	https://github.com/microsoft/onnxruntime
Domain	ML_Inference, Model_Optimization
Last Updated	2026-02-10
Workflow	Python_Inference_Pipeline
Pair	1 of 6

Overview

Configuration mechanism for tuning ONNX Runtime inference session behavior including graph optimization levels and profiling.

Description

SessionOptions allows configuring session behavior before creating an InferenceSession. It controls graph optimization (basic, extended, full), profiling, thread count, memory patterns, and execution mode. The configuration must be set before the session is constructed, as options are consumed at session creation time and cannot be changed afterward.

The primary API entry point is onnxruntime.SessionOptions(), imported at onnxruntime/__init__.py:L51. Key configurable parameters include:

enable_profiling: bool -- Enables or disables performance profiling for the inference session.
graph_optimization_level: GraphOptimizationLevel -- Controls which optimization passes run on the model graph. Valid values are:
- ORT_DISABLE_ALL -- No graph optimizations are applied.
- ORT_ENABLE_BASIC -- Basic optimizations such as constant folding and redundant node elimination.
- ORT_ENABLE_EXTENDED -- Extended optimizations including complex node fusions.
- ORT_ENABLE_ALL -- All available optimizations are applied.

Additional configurable options include:

Thread count -- Controls the number of intra-op and inter-op threads used during execution.
Memory patterns -- Enables memory pattern optimization to reuse memory allocations across operations.
Execution mode -- Selects between sequential and parallel execution of operators within the graph.

Theoretical Basis

Session configuration follows the builder pattern -- options are set before session creation, and once the session is instantiated, the configuration is immutable. This design ensures thread safety and allows the runtime to make global optimization decisions based on the complete configuration.

Graph optimization levels control which optimization passes run on the model graph. Each level is cumulative:

Basic optimizations are safe, low-cost transformations like constant folding and dead code elimination.
Extended optimizations include operator fusion patterns that may restructure the graph more aggressively.
All enables every available optimization, including hardware-specific transformations that may depend on the selected execution providers.

The optimization pipeline runs once at session creation time, producing an optimized graph that is then used for all subsequent inference calls. This amortizes the cost of optimization over the lifetime of the session.

Usage

SessionOptions is the first object created in a typical ONNX Runtime inference pipeline. It is instantiated, configured with desired parameters, and then passed to the InferenceSession constructor:

import onnxruntime as rt

options = rt.SessionOptions()
options.enable_profiling = True
options.graph_optimization_level = rt.GraphOptimizationLevel.ORT_ENABLE_ALL

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment