Environment:Microsoft Semantic kernel ONNX CUDA Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, GPU_Acceleration, Local_AI |
| Last Updated | 2026-02-11 20:00 GMT |
Overview
GPU-accelerated environment with NVIDIA CUDA 12.0+, ONNX Runtime, and DirectML support for running local AI models through the Semantic Kernel ONNX connector.
Description
This environment provides GPU acceleration for running local ONNX models via Semantic Kernel. It supports three execution providers: CPU (default), CUDA (NVIDIA GPUs), and DirectML (Windows GPU abstraction). The ONNX connector enables chat completion and embedding generation without cloud API calls, using models like Microsoft Phi-3 Mini. Each execution provider requires different NuGet packages and build configurations.
The CUDA execution provider requires the NVIDIA CUDA Toolkit v12.0 and cuDNN v9.11 or higher installed on the system, with their binary directories added to the system PATH. A known version compatibility issue exists between ONNX Runtime v1.23.2 and the GenAI CUDA package, requiring a version override to v1.22.0.
Usage
Use this environment when running local AI models through Semantic Kernel without cloud API dependencies. This is required for the ONNX Simple Chat with CUDA demo and any custom deployment using ONNX models with GPU acceleration.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Windows 10/11 or Linux | CUDA support on both; DirectML Windows only |
| Hardware | NVIDIA GPU with CUDA support | Required for CUDA execution provider; any GPU for DirectML |
| CUDA Toolkit | v12.0 or higher | Must be in system PATH |
| cuDNN | v9.11 or higher | Must be in system PATH |
| .NET SDK | 10.0.100+ or 8.0+ | Same base requirement as DotNet_SDK_Environment |
| Disk | 5GB+ | Model downloads (e.g., Phi-3 Mini is ~2.7GB) |
| Git LFS | Required | For downloading large model files from HuggingFace |
Dependencies
System Packages
- NVIDIA CUDA Toolkit 12.0+
- NVIDIA cuDNN 9.11+
- Git LFS (for model downloads)
NuGet Packages (CPU — Default)
Microsoft.ML.OnnxRuntimeGenAI>= 0.11.4
NuGet Packages (CUDA)
Microsoft.ML.OnnxRuntimeGenAI.Cuda>= 0.11.4Microsoft.ML.OnnxRuntime= 1.22.0 (version override required — see Compatibility Notes)
NuGet Packages (DirectML — Windows only)
Microsoft.ML.OnnxRuntimeGenAI.DirectML>= 0.8.1
Microsoft.SemanticKernel.Connectors.OnnxMicrosoft.ML.OnnxRuntime>= 1.23.2 (CPU/DirectML) or 1.22.0 (CUDA override)Microsoft.ML.Tokenizers>= 2.0.0
Credentials
No cloud credentials required. Models are downloaded and run locally.
- HuggingFace access: Some gated models may require a HuggingFace token for download. Set as
HF_TOKENenvironment variable.
Quick Install
# For CPU execution (default)
dotnet add package Microsoft.SemanticKernel.Connectors.Onnx
dotnet add package Microsoft.ML.OnnxRuntimeGenAI
# For CUDA execution (requires NVIDIA GPU + CUDA 12.0+)
dotnet add package Microsoft.SemanticKernel.Connectors.Onnx
dotnet add package Microsoft.ML.OnnxRuntimeGenAI.Cuda
# Download a model (example: Phi-3 Mini ONNX)
git lfs install
git clone https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx
# Windows PATH setup for CUDA
# Add to system PATH:
# C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\bin
# C:\Program Files\NVIDIA\CUDNN\v9.11\bin\12.9
Code Evidence
ONNX Runtime version override for CUDA compatibility from dotnet/samples/Demos/OnnxSimpleChatWithCuda/OnnxSimpleChatWithCuda.csproj:
<PackageReference Include="Microsoft.ML.OnnxRuntime" VersionOverride="1.22.0" NoWarn="NU1605"/>
Conditional package inclusion based on build configuration:
<!-- CPU (Release config) -->
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI" />
<!-- CUDA (Release_Cuda config) -->
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.Cuda" />
<!-- DirectML (Release_DirectML config) -->
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.DirectML" />
ONNX connector target frameworks from dotnet/src/Connectors/Connectors.Onnx/Connectors.Onnx.csproj:
<TargetFrameworks>net10.0;net8.0;netstandard2.0</TargetFrameworks>
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
DllNotFoundException: onnxruntime |
ONNX Runtime native library not found | Install correct NuGet package for your execution provider |
CUDA driver version is insufficient |
CUDA toolkit version mismatch | Install CUDA Toolkit v12.0 or higher |
NU1605: Detected package downgrade |
ONNX Runtime version conflict with CUDA GenAI | Add VersionOverride="1.22.0" NoWarn="NU1605" for Microsoft.ML.OnnxRuntime
|
Model file not found |
ONNX model not downloaded | Use git lfs to download the model files
|
cuDNN not found |
cuDNN not in system PATH | Add cuDNN bin directory to system PATH |
Compatibility Notes
- CUDA version conflict: ONNX Runtime v1.23.2 has compatibility issues with
Microsoft.ML.OnnxRuntimeGenAI.Cudav0.11.4. Use v1.22.0 as a version override withNoWarn="NU1605". - DirectML: Windows-only GPU abstraction. Supports AMD, Intel, and NVIDIA GPUs without vendor-specific drivers.
- CPU fallback: The default execution provider. No GPU or special drivers required.
- Model formats: Only ONNX-format models are supported. HuggingFace models must be converted to ONNX format first.
- Build configurations: Use
Releasefor CPU,Release_Cudafor CUDA, andRelease_DirectMLfor DirectML execution providers.