Environment:Microsoft Semantic kernel ONNX CUDA Environment

Knowledge Sources	Semantic Kernel ONNX Runtime NVIDIA CUDA Toolkit
Domains	Infrastructure, GPU_Acceleration, Local_AI
Last Updated	2026-02-11 20:00 GMT

Overview

GPU-accelerated environment with NVIDIA CUDA 12.0+, ONNX Runtime, and DirectML support for running local AI models through the Semantic Kernel ONNX connector.

Description

This environment provides GPU acceleration for running local ONNX models via Semantic Kernel. It supports three execution providers: CPU (default), CUDA (NVIDIA GPUs), and DirectML (Windows GPU abstraction). The ONNX connector enables chat completion and embedding generation without cloud API calls, using models like Microsoft Phi-3 Mini. Each execution provider requires different NuGet packages and build configurations.

The CUDA execution provider requires the NVIDIA CUDA Toolkit v12.0 and cuDNN v9.11 or higher installed on the system, with their binary directories added to the system PATH. A known version compatibility issue exists between ONNX Runtime v1.23.2 and the GenAI CUDA package, requiring a version override to v1.22.0.

Usage

Use this environment when running local AI models through Semantic Kernel without cloud API dependencies. This is required for the ONNX Simple Chat with CUDA demo and any custom deployment using ONNX models with GPU acceleration.

System Requirements

Category	Requirement	Notes
OS	Windows 10/11 or Linux	CUDA support on both; DirectML Windows only
Hardware	NVIDIA GPU with CUDA support	Required for CUDA execution provider; any GPU for DirectML
CUDA Toolkit	v12.0 or higher	Must be in system PATH
cuDNN	v9.11 or higher	Must be in system PATH
.NET SDK	10.0.100+ or 8.0+	Same base requirement as DotNet_SDK_Environment
Disk	5GB+	Model downloads (e.g., Phi-3 Mini is ~2.7GB)
Git LFS	Required	For downloading large model files from HuggingFace

Dependencies

System Packages

NVIDIA CUDA Toolkit 12.0+
NVIDIA cuDNN 9.11+
Git LFS (for model downloads)

NuGet Packages (CPU — Default)

Microsoft.ML.OnnxRuntimeGenAI >= 0.11.4

NuGet Packages (CUDA)

Microsoft.ML.OnnxRuntimeGenAI.Cuda >= 0.11.4
Microsoft.ML.OnnxRuntime = 1.22.0 (version override required — see Compatibility Notes)

NuGet Packages (DirectML — Windows only)

Microsoft.ML.OnnxRuntimeGenAI.DirectML >= 0.8.1

Shared NuGet Packages

Microsoft.SemanticKernel.Connectors.Onnx
Microsoft.ML.OnnxRuntime >= 1.23.2 (CPU/DirectML) or 1.22.0 (CUDA override)
Microsoft.ML.Tokenizers >= 2.0.0

Credentials

No cloud credentials required. Models are downloaded and run locally.

HuggingFace access: Some gated models may require a HuggingFace token for download. Set as HF_TOKEN environment variable.

Quick Install

# For CPU execution (default)
dotnet add package Microsoft.SemanticKernel.Connectors.Onnx
dotnet add package Microsoft.ML.OnnxRuntimeGenAI

# For CUDA execution (requires NVIDIA GPU + CUDA 12.0+)
dotnet add package Microsoft.SemanticKernel.Connectors.Onnx
dotnet add package Microsoft.ML.OnnxRuntimeGenAI.Cuda

# Download a model (example: Phi-3 Mini ONNX)
git lfs install
git clone https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx

# Windows PATH setup for CUDA
# Add to system PATH:
#   C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\bin
#   C:\Program Files\NVIDIA\CUDNN\v9.11\bin\12.9

Code Evidence

ONNX Runtime version override for CUDA compatibility from dotnet/samples/Demos/OnnxSimpleChatWithCuda/OnnxSimpleChatWithCuda.csproj:

<PackageReference Include="Microsoft.ML.OnnxRuntime" VersionOverride="1.22.0" NoWarn="NU1605"/>

Conditional package inclusion based on build configuration:

<!-- CPU (Release config) -->
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI" />

<!-- CUDA (Release_Cuda config) -->
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.Cuda" />

<!-- DirectML (Release_DirectML config) -->
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.DirectML" />

ONNX connector target frameworks from dotnet/src/Connectors/Connectors.Onnx/Connectors.Onnx.csproj:

<TargetFrameworks>net10.0;net8.0;netstandard2.0</TargetFrameworks>

Common Errors

Error Message	Cause	Solution
`DllNotFoundException: onnxruntime`	ONNX Runtime native library not found	Install correct NuGet package for your execution provider
`CUDA driver version is insufficient`	CUDA toolkit version mismatch	Install CUDA Toolkit v12.0 or higher
`NU1605: Detected package downgrade`	ONNX Runtime version conflict with CUDA GenAI	Add `VersionOverride="1.22.0" NoWarn="NU1605"` for Microsoft.ML.OnnxRuntime
`Model file not found`	ONNX model not downloaded	Use `git lfs` to download the model files
`cuDNN not found`	cuDNN not in system PATH	Add cuDNN bin directory to system PATH

Compatibility Notes

CUDA version conflict: ONNX Runtime v1.23.2 has compatibility issues with Microsoft.ML.OnnxRuntimeGenAI.Cuda v0.11.4. Use v1.22.0 as a version override with NoWarn="NU1605".
DirectML: Windows-only GPU abstraction. Supports AMD, Intel, and NVIDIA GPUs without vendor-specific drivers.
CPU fallback: The default execution provider. No GPU or special drivers required.
Model formats: Only ONNX-format models are supported. HuggingFace models must be converted to ONNX format first.
Build configurations: Use Release for CPU, Release_Cuda for CUDA, and Release_DirectML for DirectML execution providers.

Related Pages

Implementation:Microsoft_Semantic_kernel_IEmbeddingGenerator_GenerateAsync

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment