Environment:Ggml org Ggml CUDA GPU Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, GPU_Computing |
| Last Updated | 2026-02-10 07:40 GMT |
Overview
NVIDIA CUDA GPU environment supporting compute capabilities 50 through 121, requiring CUDA Toolkit 10.0+ and CMake 3.18+ for GPU-accelerated tensor operations.
Description
This environment provides GPU acceleration for GGML tensor operations on NVIDIA GPUs via the CUDA backend. The same backend code also supports AMD GPUs (via HIP/ROCm) and Moore Threads GPUs (via MUSA) through vendor-specific compatibility headers. The backend supports a wide range of GPU architectures from Maxwell (CC 50) to Rubin (CC 121), with newer architectures requiring newer CUDA Toolkit versions. Features like FP16 tensor cores, INT8 tensor cores, and FP4 tensor cores are architecture-dependent.
Usage
Use this environment for GPU-accelerated inference and training workloads. It is required when running models that need more compute throughput than the CPU backend can provide, or when using GPU-specific features like tensor cores for quantized matrix multiplication.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux or Windows | Self-hosted CI uses Ubuntu 22.04 |
| GPU | NVIDIA GPU (CC >= 50) | Maxwell or newer |
| VRAM | Depends on model size | Larger models require more VRAM |
| Build System | CMake 3.18+ | Required for CMAKE_CUDA_ARCHITECTURES |
Dependencies
System Packages
- NVIDIA CUDA Toolkit >= 10.0
- CC 86 (RTX 3000): requires CUDA >= 11.1
- CC 89 (RTX 4000): requires CUDA >= 11.8
- CC 120 (Blackwell): requires CUDA >= 12.8
- CC 121 (Rubin): requires CUDA >= 12.9
- NVIDIA GPU driver compatible with toolkit version
Optional
- CCCL 3.2 (NVIDIA C++ Core Libraries) for latest features
- CUDA 12.8+ for compression mode support
Credentials
No credentials are required.
Quick Install
# Install CUDA Toolkit (Ubuntu)
sudo apt-get install nvidia-cuda-toolkit
# Build GGML with CUDA backend
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release
Code Evidence
Architecture selection from `src/ggml-cuda/CMakeLists.txt:8-55`:
if (NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
# native == GPUs available at build time
# 50 == Maxwell, lowest CUDA 12 standard
# 60 == P100, FP16 CUDA intrinsics
# 61 == Pascal, __dp4a instruction (per-byte integer dot product)
# 70 == V100, FP16 tensor cores
# 75 == Turing, int8 tensor cores
# 80 == Ampere, asynchronous data loading, faster tensor core instructions
# 86 == RTX 3000, needs CUDA v11.1
# 89 == RTX 4000, needs CUDA v11.8
# 120 == Blackwell, needs CUDA v12.8, FP4 tensor cores
CMake minimum version for CUDA from `src/ggml-cuda/CMakeLists.txt:1`:
cmake_minimum_required(VERSION 3.18) # for CMAKE_CUDA_ARCHITECTURES
Blackwell architecture handling from `src/ggml-cuda/CMakeLists.txt:39-51`:
if (CUDAToolkit_VERSION VERSION_GREATER_EQUAL "12.8")
# The CUDA architecture 120f-virtual would in principle work for Blackwell
# but the newly added "f" suffix conflicted with a preexisting regex
# for validating CUDA architectures in CMake.
list(APPEND CMAKE_CUDA_ARCHITECTURES 120a-real)
endif()
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `CUDAToolkit not found` | CUDA Toolkit not installed | Install CUDA Toolkit from NVIDIA developer site |
| `Unsupported gpu architecture 'compute_XX'` | CUDA version too old for target GPU | Upgrade CUDA Toolkit to version supporting your GPU architecture |
| `CMake 3.18 or higher is required` | CMake too old for CUDA support | Upgrade CMake to 3.18+ |
| `CUDA out of memory` | Model too large for available VRAM | Use a smaller model or quantized model format |
Compatibility Notes
- AMD GPUs (ROCm/HIP): The same backend code compiles for AMD GPUs via the HIP compatibility layer (`src/ggml-cuda/vendors/hip.h`). Build with `-DGGML_HIP=ON`.
- Moore Threads (MUSA): Supported via MUSA compatibility header (`src/ggml-cuda/vendors/musa.h`). Build with `-DGGML_MUSA=ON`.
- Blackwell (CC 120): Requires CMake >= 3.31.8 or >= 4.0.2 for the `120f-virtual` architecture suffix; alternatively use `120a-real` which works with any CMake version.
- Native builds: Use `GGML_NATIVE=ON` with CUDA >= 11.6 and CMake >= 3.24 to auto-detect installed GPU architecture.