Environment:Ollama Ollama GPU Runtime
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, GPU, Deep_Learning |
| Last Updated | 2026-02-14 22:00 GMT |
Overview
Multi-backend GPU acceleration environment supporting NVIDIA CUDA (11.8/12.8/13.0), AMD ROCm (6.3.3+), Apple Metal, and Vulkan (1.4+) for model inference offloading.
Description
This environment enables GPU-accelerated model inference in Ollama. The system uses a runner-based GPU discovery mechanism that spawns lightweight processes to enumerate and validate GPU devices across multiple backend libraries (CUDA, ROCm, Metal, Vulkan). Each backend is loaded dynamically at runtime via shared libraries, allowing the same Ollama binary to support different GPU vendors.
The discovery system runs in two phases: a bootstrap phase (serial enumeration, 30-90s timeout) and a validation phase (parallel device verification). Devices must pass minimum memory requirements (457 MiB for CUDA/ROCm/Vulkan, 512 MiB for Metal) before being considered usable.
Usage
Use this environment when running model inference with GPU acceleration. It is the mandatory prerequisite for the Scheduler_GetRunner implementation, which handles model loading and GPU memory allocation. Without this environment, models run on CPU only.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (x86_64, arm64), macOS 14.0+ (arm64), Windows (x86_64) | Each OS has different GPU backend availability |
| NVIDIA GPU | CUDA Compute Capability >= 5.0 | Minimum 457 MiB VRAM; Flash Attention requires CC >= 7.0 (excluding 7.2) |
| AMD GPU | ROCm 6.3.3+ | Requires rocblas; unsupported devices will crash during init validation |
| Apple GPU | Metal-capable Apple Silicon | macOS 14.0+; minimum 512 MiB VRAM |
| Vulkan GPU | Vulkan 1.4+ SDK | Experimental; must be enabled via OLLAMA_VULKAN=1 |
| NVIDIA Jetson | JetPack 5 or 6 | Detected via JETSON_JETPACK env var or /etc/nv_tegra_release |
Dependencies
System Packages (NVIDIA CUDA)
- `cuda-toolkit` = 11.8, 12.8, or 13.0
- `cudnn` >= 8.6 (for CUDA 11.8/12.8) or `libcudnn9` (for CUDA 13.0)
- NVIDIA driver compatible with installed CUDA version
System Packages (AMD ROCm)
- `rocm` >= 6.3.3
- `rocm-libs` (includes rocblas)
- `hip-runtime-amd`
System Packages (Vulkan)
- `vulkan-sdk` >= 1.4.321.1
- `mesa-vulkan-drivers`
- `libvulkan1`, `libvulkan-dev`
System Packages (Apple Metal)
- macOS 14.0+ with Xcode Command Line Tools
- Metal framework (included in macOS)
Credentials
The following environment variables control GPU behavior:
- `CUDA_VISIBLE_DEVICES`: Set which NVIDIA devices are visible
- `HIP_VISIBLE_DEVICES`: Set which AMD devices are visible by numeric ID
- `ROCR_VISIBLE_DEVICES`: Set which AMD devices are visible by UUID or numeric ID
- `GGML_VK_VISIBLE_DEVICES`: Set which Vulkan devices are visible by numeric ID
- `GPU_DEVICE_ORDINAL`: Set which AMD devices are visible by numeric ID
- `HSA_OVERRIDE_GFX_VERSION`: Override the gfx version for all detected AMD GPUs
- `OLLAMA_LLM_LIBRARY`: Override GPU library auto-detection
- `OLLAMA_VULKAN`: Enable experimental Vulkan backend (set to `1`)
- `OLLAMA_GPU_OVERHEAD`: Reserve additional VRAM per GPU (bytes)
- `OLLAMA_FLASH_ATTENTION`: Enable flash attention (requires compatible hardware)
- `OLLAMA_KV_CACHE_TYPE`: KV cache quantization type (default: `f16`)
- `OLLAMA_SCHED_SPREAD`: Spread model across all GPUs
- `JETSON_JETPACK`: JetPack version for NVIDIA Jetson devices
Quick Install
# NVIDIA CUDA (Ubuntu/Debian)
sudo apt install nvidia-cuda-toolkit
# AMD ROCm (Ubuntu 22.04)
# See https://rocm.docs.amd.com/ for full instructions
sudo apt install rocm-libs
# Vulkan (Ubuntu/Debian)
sudo apt install mesa-vulkan-drivers vulkan-tools libvulkan1 libvulkan-dev
# Enable Vulkan in Ollama (experimental)
export OLLAMA_VULKAN=1
Code Evidence
GPU minimum memory requirements from `ml/device.go:345-353`:
func (d DeviceInfo) MinimumMemory() uint64 {
if d.Library == "Metal" {
return 512 * format.MebiByte
}
return 457 * format.MebiByte
}
Flash Attention hardware validation from `ml/device.go:479-493`:
func FlashAttentionSupported(l []DeviceInfo) bool {
for _, gpu := range l {
supportsFA := gpu.Library == "cpu" ||
gpu.Name == "Metal" || gpu.Library == "Metal" ||
(gpu.Library == "CUDA" && gpu.DriverMajor >= 7 &&
!(gpu.ComputeMajor == 7 && gpu.ComputeMinor == 2)) ||
gpu.Library == "ROCm" ||
gpu.Library == "Vulkan"
if !supportsFA {
return false
}
}
return true
}
Jetson/Tegra detection from `discover/gpu.go:16-18`:
var CudaTegra string = os.Getenv("JETSON_JETPACK")
Bootstrap timeout with Windows AV consideration from `discover/runner.go:86-94`:
bootstrapTimeout := 30 * time.Second
if runtime.GOOS == "windows" {
// On Windows with Defender enabled, AV scanning of the DLLs
// takes place sequentially and this can significantly increase
// the time it takes to do the initial discovery pass.
bootstrapTimeout = 90 * time.Second
}
ROCm/CUDA init validation from `ml/device.go:535-547`:
func (d DeviceInfo) NeedsInitValidation() bool {
// ROCm: rocblas will crash on unsupported devices.
// CUDA: verify CC is supported by the version of the library
return d.Library == "ROCm" || d.Library == "CUDA"
}
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `experimental Vulkan support disabled` | Vulkan not explicitly enabled | Set `OLLAMA_VULKAN=1` environment variable |
| `jetpack not detected` | NVIDIA Jetson not recognized | Set `JETSON_JETPACK` or `OLLAMA_LLM_LIBRARY` environment variable |
| GPU device crashes during init | ROCm device not supported by rocblas | Check `HSA_OVERRIDE_GFX_VERSION` or use a supported AMD GPU |
| `if GPUs are not correctly discovered, unset and try again` | User override of visible device env vars | Unset `CUDA_VISIBLE_DEVICES` / `HIP_VISIBLE_DEVICES` and retry |
| No GPU detected, running on CPU | No compatible GPU libraries found | Install appropriate GPU drivers and toolkit |
Compatibility Notes
- NVIDIA CUDA: Supports CUDA 11.8, 12.8, and 13.0. Flash Attention requires Compute Capability >= 7.0 but excludes CC 7.2 (some Turing variants). Library preference: CUDA is preferred over ROCm over Vulkan.
- AMD ROCm: Requires deep initialization validation; rocblas will crash on unsupported devices. Use `HSA_OVERRIDE_GFX_VERSION` to force gfx version on edge-case hardware.
- Apple Metal: Only available on macOS arm64 (Apple Silicon). Metal never updates free VRAM readings after initial discovery (cached values used). Uses Accelerate framework for BLAS.
- Vulkan: Experimental support. Must be explicitly enabled. Memory-mapped model loading (mmap) is automatically disabled when using Vulkan.
- Windows: Bootstrap GPU discovery timeout is 90 seconds (vs 30 seconds on Linux/macOS) due to Windows Defender AV scanning delays. mmap is disabled for CUDA on Windows.
- NVIDIA Jetson: Auto-detected via `JETSON_JETPACK` env var or `/etc/nv_tegra_release`. Maps L4T version 35 to JetPack 5, version 36 to JetPack 6.