Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Ollama Ollama GPU Runtime

From Leeroopedia
Knowledge Sources
Domains Infrastructure, GPU, Deep_Learning
Last Updated 2026-02-14 22:00 GMT

Overview

Multi-backend GPU acceleration environment supporting NVIDIA CUDA (11.8/12.8/13.0), AMD ROCm (6.3.3+), Apple Metal, and Vulkan (1.4+) for model inference offloading.

Description

This environment enables GPU-accelerated model inference in Ollama. The system uses a runner-based GPU discovery mechanism that spawns lightweight processes to enumerate and validate GPU devices across multiple backend libraries (CUDA, ROCm, Metal, Vulkan). Each backend is loaded dynamically at runtime via shared libraries, allowing the same Ollama binary to support different GPU vendors.

The discovery system runs in two phases: a bootstrap phase (serial enumeration, 30-90s timeout) and a validation phase (parallel device verification). Devices must pass minimum memory requirements (457 MiB for CUDA/ROCm/Vulkan, 512 MiB for Metal) before being considered usable.

Usage

Use this environment when running model inference with GPU acceleration. It is the mandatory prerequisite for the Scheduler_GetRunner implementation, which handles model loading and GPU memory allocation. Without this environment, models run on CPU only.

System Requirements

Category Requirement Notes
OS Linux (x86_64, arm64), macOS 14.0+ (arm64), Windows (x86_64) Each OS has different GPU backend availability
NVIDIA GPU CUDA Compute Capability >= 5.0 Minimum 457 MiB VRAM; Flash Attention requires CC >= 7.0 (excluding 7.2)
AMD GPU ROCm 6.3.3+ Requires rocblas; unsupported devices will crash during init validation
Apple GPU Metal-capable Apple Silicon macOS 14.0+; minimum 512 MiB VRAM
Vulkan GPU Vulkan 1.4+ SDK Experimental; must be enabled via OLLAMA_VULKAN=1
NVIDIA Jetson JetPack 5 or 6 Detected via JETSON_JETPACK env var or /etc/nv_tegra_release

Dependencies

System Packages (NVIDIA CUDA)

  • `cuda-toolkit` = 11.8, 12.8, or 13.0
  • `cudnn` >= 8.6 (for CUDA 11.8/12.8) or `libcudnn9` (for CUDA 13.0)
  • NVIDIA driver compatible with installed CUDA version

System Packages (AMD ROCm)

  • `rocm` >= 6.3.3
  • `rocm-libs` (includes rocblas)
  • `hip-runtime-amd`

System Packages (Vulkan)

  • `vulkan-sdk` >= 1.4.321.1
  • `mesa-vulkan-drivers`
  • `libvulkan1`, `libvulkan-dev`

System Packages (Apple Metal)

  • macOS 14.0+ with Xcode Command Line Tools
  • Metal framework (included in macOS)

Credentials

The following environment variables control GPU behavior:

  • `CUDA_VISIBLE_DEVICES`: Set which NVIDIA devices are visible
  • `HIP_VISIBLE_DEVICES`: Set which AMD devices are visible by numeric ID
  • `ROCR_VISIBLE_DEVICES`: Set which AMD devices are visible by UUID or numeric ID
  • `GGML_VK_VISIBLE_DEVICES`: Set which Vulkan devices are visible by numeric ID
  • `GPU_DEVICE_ORDINAL`: Set which AMD devices are visible by numeric ID
  • `HSA_OVERRIDE_GFX_VERSION`: Override the gfx version for all detected AMD GPUs
  • `OLLAMA_LLM_LIBRARY`: Override GPU library auto-detection
  • `OLLAMA_VULKAN`: Enable experimental Vulkan backend (set to `1`)
  • `OLLAMA_GPU_OVERHEAD`: Reserve additional VRAM per GPU (bytes)
  • `OLLAMA_FLASH_ATTENTION`: Enable flash attention (requires compatible hardware)
  • `OLLAMA_KV_CACHE_TYPE`: KV cache quantization type (default: `f16`)
  • `OLLAMA_SCHED_SPREAD`: Spread model across all GPUs
  • `JETSON_JETPACK`: JetPack version for NVIDIA Jetson devices

Quick Install

# NVIDIA CUDA (Ubuntu/Debian)
sudo apt install nvidia-cuda-toolkit

# AMD ROCm (Ubuntu 22.04)
# See https://rocm.docs.amd.com/ for full instructions
sudo apt install rocm-libs

# Vulkan (Ubuntu/Debian)
sudo apt install mesa-vulkan-drivers vulkan-tools libvulkan1 libvulkan-dev

# Enable Vulkan in Ollama (experimental)
export OLLAMA_VULKAN=1

Code Evidence

GPU minimum memory requirements from `ml/device.go:345-353`:

func (d DeviceInfo) MinimumMemory() uint64 {
    if d.Library == "Metal" {
        return 512 * format.MebiByte
    }
    return 457 * format.MebiByte
}

Flash Attention hardware validation from `ml/device.go:479-493`:

func FlashAttentionSupported(l []DeviceInfo) bool {
    for _, gpu := range l {
        supportsFA := gpu.Library == "cpu" ||
            gpu.Name == "Metal" || gpu.Library == "Metal" ||
            (gpu.Library == "CUDA" && gpu.DriverMajor >= 7 &&
                !(gpu.ComputeMajor == 7 && gpu.ComputeMinor == 2)) ||
            gpu.Library == "ROCm" ||
            gpu.Library == "Vulkan"
        if !supportsFA {
            return false
        }
    }
    return true
}

Jetson/Tegra detection from `discover/gpu.go:16-18`:

var CudaTegra string = os.Getenv("JETSON_JETPACK")

Bootstrap timeout with Windows AV consideration from `discover/runner.go:86-94`:

bootstrapTimeout := 30 * time.Second
if runtime.GOOS == "windows" {
    // On Windows with Defender enabled, AV scanning of the DLLs
    // takes place sequentially and this can significantly increase
    // the time it takes to do the initial discovery pass.
    bootstrapTimeout = 90 * time.Second
}

ROCm/CUDA init validation from `ml/device.go:535-547`:

func (d DeviceInfo) NeedsInitValidation() bool {
    // ROCm: rocblas will crash on unsupported devices.
    // CUDA: verify CC is supported by the version of the library
    return d.Library == "ROCm" || d.Library == "CUDA"
}

Common Errors

Error Message Cause Solution
`experimental Vulkan support disabled` Vulkan not explicitly enabled Set `OLLAMA_VULKAN=1` environment variable
`jetpack not detected` NVIDIA Jetson not recognized Set `JETSON_JETPACK` or `OLLAMA_LLM_LIBRARY` environment variable
GPU device crashes during init ROCm device not supported by rocblas Check `HSA_OVERRIDE_GFX_VERSION` or use a supported AMD GPU
`if GPUs are not correctly discovered, unset and try again` User override of visible device env vars Unset `CUDA_VISIBLE_DEVICES` / `HIP_VISIBLE_DEVICES` and retry
No GPU detected, running on CPU No compatible GPU libraries found Install appropriate GPU drivers and toolkit

Compatibility Notes

  • NVIDIA CUDA: Supports CUDA 11.8, 12.8, and 13.0. Flash Attention requires Compute Capability >= 7.0 but excludes CC 7.2 (some Turing variants). Library preference: CUDA is preferred over ROCm over Vulkan.
  • AMD ROCm: Requires deep initialization validation; rocblas will crash on unsupported devices. Use `HSA_OVERRIDE_GFX_VERSION` to force gfx version on edge-case hardware.
  • Apple Metal: Only available on macOS arm64 (Apple Silicon). Metal never updates free VRAM readings after initial discovery (cached values used). Uses Accelerate framework for BLAS.
  • Vulkan: Experimental support. Must be explicitly enabled. Memory-mapped model loading (mmap) is automatically disabled when using Vulkan.
  • Windows: Bootstrap GPU discovery timeout is 90 seconds (vs 30 seconds on Linux/macOS) due to Windows Defender AV scanning delays. mmap is disabled for CUDA on Windows.
  • NVIDIA Jetson: Auto-detected via `JETSON_JETPACK` env var or `/etc/nv_tegra_release`. Maps L4T version 35 to JetPack 5, version 36 to JetPack 6.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment