Environment:Ggml org Ggml CUDA GPU Environment

Knowledge Sources	GGML CUDA CMakeLists
Domains	Infrastructure, GPU_Computing
Last Updated	2026-02-10 07:40 GMT

Overview

NVIDIA CUDA GPU environment supporting compute capabilities 50 through 121, requiring CUDA Toolkit 10.0+ and CMake 3.18+ for GPU-accelerated tensor operations.

Description

This environment provides GPU acceleration for GGML tensor operations on NVIDIA GPUs via the CUDA backend. The same backend code also supports AMD GPUs (via HIP/ROCm) and Moore Threads GPUs (via MUSA) through vendor-specific compatibility headers. The backend supports a wide range of GPU architectures from Maxwell (CC 50) to Rubin (CC 121), with newer architectures requiring newer CUDA Toolkit versions. Features like FP16 tensor cores, INT8 tensor cores, and FP4 tensor cores are architecture-dependent.

Usage

Use this environment for GPU-accelerated inference and training workloads. It is required when running models that need more compute throughput than the CPU backend can provide, or when using GPU-specific features like tensor cores for quantized matrix multiplication.

System Requirements

Category	Requirement	Notes
OS	Linux or Windows	Self-hosted CI uses Ubuntu 22.04
GPU	NVIDIA GPU (CC >= 50)	Maxwell or newer
VRAM	Depends on model size	Larger models require more VRAM
Build System	CMake 3.18+	Required for CMAKE_CUDA_ARCHITECTURES

Dependencies

System Packages

NVIDIA CUDA Toolkit >= 10.0
- CC 86 (RTX 3000): requires CUDA >= 11.1
- CC 89 (RTX 4000): requires CUDA >= 11.8
- CC 120 (Blackwell): requires CUDA >= 12.8
- CC 121 (Rubin): requires CUDA >= 12.9
NVIDIA GPU driver compatible with toolkit version

Optional

CCCL 3.2 (NVIDIA C++ Core Libraries) for latest features
CUDA 12.8+ for compression mode support

Credentials

No credentials are required.

Quick Install

# Install CUDA Toolkit (Ubuntu)
sudo apt-get install nvidia-cuda-toolkit

# Build GGML with CUDA backend
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

Code Evidence

Architecture selection from `src/ggml-cuda/CMakeLists.txt:8-55`:

if (NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
    # native == GPUs available at build time
    # 50     == Maxwell, lowest CUDA 12 standard
    # 60     == P100, FP16 CUDA intrinsics
    # 61     == Pascal, __dp4a instruction (per-byte integer dot product)
    # 70     == V100, FP16 tensor cores
    # 75     == Turing, int8 tensor cores
    # 80     == Ampere, asynchronous data loading, faster tensor core instructions
    # 86     == RTX 3000, needs CUDA v11.1
    # 89     == RTX 4000, needs CUDA v11.8
    # 120    == Blackwell, needs CUDA v12.8, FP4 tensor cores

CMake minimum version for CUDA from `src/ggml-cuda/CMakeLists.txt:1`:

cmake_minimum_required(VERSION 3.18)  # for CMAKE_CUDA_ARCHITECTURES

Blackwell architecture handling from `src/ggml-cuda/CMakeLists.txt:39-51`:

if (CUDAToolkit_VERSION VERSION_GREATER_EQUAL "12.8")
    # The CUDA architecture 120f-virtual would in principle work for Blackwell
    # but the newly added "f" suffix conflicted with a preexisting regex
    # for validating CUDA architectures in CMake.
    list(APPEND CMAKE_CUDA_ARCHITECTURES 120a-real)
endif()

Common Errors

Error Message	Cause	Solution
`CUDAToolkit not found`	CUDA Toolkit not installed	Install CUDA Toolkit from NVIDIA developer site
`Unsupported gpu architecture 'compute_XX'`	CUDA version too old for target GPU	Upgrade CUDA Toolkit to version supporting your GPU architecture
`CMake 3.18 or higher is required`	CMake too old for CUDA support	Upgrade CMake to 3.18+
`CUDA out of memory`	Model too large for available VRAM	Use a smaller model or quantized model format

Compatibility Notes

AMD GPUs (ROCm/HIP): The same backend code compiles for AMD GPUs via the HIP compatibility layer (`src/ggml-cuda/vendors/hip.h`). Build with `-DGGML_HIP=ON`.
Moore Threads (MUSA): Supported via MUSA compatibility header (`src/ggml-cuda/vendors/musa.h`). Build with `-DGGML_MUSA=ON`.
Blackwell (CC 120): Requires CMake >= 3.31.8 or >= 4.0.2 for the `120f-virtual` architecture suffix; alternatively use `120a-real` which works with any CMake version.
Native builds: Use `GGML_NATIVE=ON` with CUDA >= 11.6 and CMake >= 3.24 to auto-detect installed GPU architecture.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment