Principle:Microsoft Onnxruntime Session Options Configuration
Metadata
| Field | Value |
|---|---|
| Principle Name | Session_Options_Configuration |
| Repository | Microsoft_Onnxruntime |
| Source Repository | https://github.com/microsoft/onnxruntime |
| Domain | ML_Inference, Model_Optimization |
| Last Updated | 2026-02-10 |
| Workflow | Python_Inference_Pipeline |
| Pair | 1 of 6 |
Overview
Configuration mechanism for tuning ONNX Runtime inference session behavior including graph optimization levels and profiling.
Description
SessionOptions allows configuring session behavior before creating an InferenceSession. It controls graph optimization (basic, extended, full), profiling, thread count, memory patterns, and execution mode. The configuration must be set before the session is constructed, as options are consumed at session creation time and cannot be changed afterward.
The primary API entry point is onnxruntime.SessionOptions(), imported at onnxruntime/__init__.py:L51. Key configurable parameters include:
enable_profiling: bool-- Enables or disables performance profiling for the inference session.graph_optimization_level: GraphOptimizationLevel-- Controls which optimization passes run on the model graph. Valid values are:ORT_DISABLE_ALL-- No graph optimizations are applied.ORT_ENABLE_BASIC-- Basic optimizations such as constant folding and redundant node elimination.ORT_ENABLE_EXTENDED-- Extended optimizations including complex node fusions.ORT_ENABLE_ALL-- All available optimizations are applied.
Additional configurable options include:
- Thread count -- Controls the number of intra-op and inter-op threads used during execution.
- Memory patterns -- Enables memory pattern optimization to reuse memory allocations across operations.
- Execution mode -- Selects between sequential and parallel execution of operators within the graph.
Theoretical Basis
Session configuration follows the builder pattern -- options are set before session creation, and once the session is instantiated, the configuration is immutable. This design ensures thread safety and allows the runtime to make global optimization decisions based on the complete configuration.
Graph optimization levels control which optimization passes run on the model graph. Each level is cumulative:
- Basic optimizations are safe, low-cost transformations like constant folding and dead code elimination.
- Extended optimizations include operator fusion patterns that may restructure the graph more aggressively.
- All enables every available optimization, including hardware-specific transformations that may depend on the selected execution providers.
The optimization pipeline runs once at session creation time, producing an optimized graph that is then used for all subsequent inference calls. This amortizes the cost of optimization over the lifetime of the session.
Usage
SessionOptions is the first object created in a typical ONNX Runtime inference pipeline. It is instantiated, configured with desired parameters, and then passed to the InferenceSession constructor:
import onnxruntime as rt
options = rt.SessionOptions()
options.enable_profiling = True
options.graph_optimization_level = rt.GraphOptimizationLevel.ORT_ENABLE_ALL