Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Microsoft Semantic kernel ONNX CUDA Environment

From Leeroopedia
Knowledge Sources
Domains Infrastructure, GPU_Acceleration, Local_AI
Last Updated 2026-02-11 20:00 GMT

Overview

GPU-accelerated environment with NVIDIA CUDA 12.0+, ONNX Runtime, and DirectML support for running local AI models through the Semantic Kernel ONNX connector.

Description

This environment provides GPU acceleration for running local ONNX models via Semantic Kernel. It supports three execution providers: CPU (default), CUDA (NVIDIA GPUs), and DirectML (Windows GPU abstraction). The ONNX connector enables chat completion and embedding generation without cloud API calls, using models like Microsoft Phi-3 Mini. Each execution provider requires different NuGet packages and build configurations.

The CUDA execution provider requires the NVIDIA CUDA Toolkit v12.0 and cuDNN v9.11 or higher installed on the system, with their binary directories added to the system PATH. A known version compatibility issue exists between ONNX Runtime v1.23.2 and the GenAI CUDA package, requiring a version override to v1.22.0.

Usage

Use this environment when running local AI models through Semantic Kernel without cloud API dependencies. This is required for the ONNX Simple Chat with CUDA demo and any custom deployment using ONNX models with GPU acceleration.

System Requirements

Category Requirement Notes
OS Windows 10/11 or Linux CUDA support on both; DirectML Windows only
Hardware NVIDIA GPU with CUDA support Required for CUDA execution provider; any GPU for DirectML
CUDA Toolkit v12.0 or higher Must be in system PATH
cuDNN v9.11 or higher Must be in system PATH
.NET SDK 10.0.100+ or 8.0+ Same base requirement as DotNet_SDK_Environment
Disk 5GB+ Model downloads (e.g., Phi-3 Mini is ~2.7GB)
Git LFS Required For downloading large model files from HuggingFace

Dependencies

System Packages

  • NVIDIA CUDA Toolkit 12.0+
  • NVIDIA cuDNN 9.11+
  • Git LFS (for model downloads)

NuGet Packages (CPU — Default)

  • Microsoft.ML.OnnxRuntimeGenAI >= 0.11.4

NuGet Packages (CUDA)

  • Microsoft.ML.OnnxRuntimeGenAI.Cuda >= 0.11.4
  • Microsoft.ML.OnnxRuntime = 1.22.0 (version override required — see Compatibility Notes)

NuGet Packages (DirectML — Windows only)

  • Microsoft.ML.OnnxRuntimeGenAI.DirectML >= 0.8.1

Shared NuGet Packages

  • Microsoft.SemanticKernel.Connectors.Onnx
  • Microsoft.ML.OnnxRuntime >= 1.23.2 (CPU/DirectML) or 1.22.0 (CUDA override)
  • Microsoft.ML.Tokenizers >= 2.0.0

Credentials

No cloud credentials required. Models are downloaded and run locally.

  • HuggingFace access: Some gated models may require a HuggingFace token for download. Set as HF_TOKEN environment variable.

Quick Install

# For CPU execution (default)
dotnet add package Microsoft.SemanticKernel.Connectors.Onnx
dotnet add package Microsoft.ML.OnnxRuntimeGenAI

# For CUDA execution (requires NVIDIA GPU + CUDA 12.0+)
dotnet add package Microsoft.SemanticKernel.Connectors.Onnx
dotnet add package Microsoft.ML.OnnxRuntimeGenAI.Cuda

# Download a model (example: Phi-3 Mini ONNX)
git lfs install
git clone https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx

# Windows PATH setup for CUDA
# Add to system PATH:
#   C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\bin
#   C:\Program Files\NVIDIA\CUDNN\v9.11\bin\12.9

Code Evidence

ONNX Runtime version override for CUDA compatibility from dotnet/samples/Demos/OnnxSimpleChatWithCuda/OnnxSimpleChatWithCuda.csproj:

<PackageReference Include="Microsoft.ML.OnnxRuntime" VersionOverride="1.22.0" NoWarn="NU1605"/>

Conditional package inclusion based on build configuration:

<!-- CPU (Release config) -->
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI" />

<!-- CUDA (Release_Cuda config) -->
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.Cuda" />

<!-- DirectML (Release_DirectML config) -->
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.DirectML" />

ONNX connector target frameworks from dotnet/src/Connectors/Connectors.Onnx/Connectors.Onnx.csproj:

<TargetFrameworks>net10.0;net8.0;netstandard2.0</TargetFrameworks>

Common Errors

Error Message Cause Solution
DllNotFoundException: onnxruntime ONNX Runtime native library not found Install correct NuGet package for your execution provider
CUDA driver version is insufficient CUDA toolkit version mismatch Install CUDA Toolkit v12.0 or higher
NU1605: Detected package downgrade ONNX Runtime version conflict with CUDA GenAI Add VersionOverride="1.22.0" NoWarn="NU1605" for Microsoft.ML.OnnxRuntime
Model file not found ONNX model not downloaded Use git lfs to download the model files
cuDNN not found cuDNN not in system PATH Add cuDNN bin directory to system PATH

Compatibility Notes

  • CUDA version conflict: ONNX Runtime v1.23.2 has compatibility issues with Microsoft.ML.OnnxRuntimeGenAI.Cuda v0.11.4. Use v1.22.0 as a version override with NoWarn="NU1605".
  • DirectML: Windows-only GPU abstraction. Supports AMD, Intel, and NVIDIA GPUs without vendor-specific drivers.
  • CPU fallback: The default execution provider. No GPU or special drivers required.
  • Model formats: Only ONNX-format models are supported. HuggingFace models must be converted to ONNX format first.
  • Build configurations: Use Release for CPU, Release_Cuda for CUDA, and Release_DirectML for DirectML execution providers.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment