Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Ggml org Ggml CUDA GPU Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, GPU_Computing
Last Updated 2026-02-10 07:40 GMT

Overview

NVIDIA CUDA GPU environment supporting compute capabilities 50 through 121, requiring CUDA Toolkit 10.0+ and CMake 3.18+ for GPU-accelerated tensor operations.

Description

This environment provides GPU acceleration for GGML tensor operations on NVIDIA GPUs via the CUDA backend. The same backend code also supports AMD GPUs (via HIP/ROCm) and Moore Threads GPUs (via MUSA) through vendor-specific compatibility headers. The backend supports a wide range of GPU architectures from Maxwell (CC 50) to Rubin (CC 121), with newer architectures requiring newer CUDA Toolkit versions. Features like FP16 tensor cores, INT8 tensor cores, and FP4 tensor cores are architecture-dependent.

Usage

Use this environment for GPU-accelerated inference and training workloads. It is required when running models that need more compute throughput than the CPU backend can provide, or when using GPU-specific features like tensor cores for quantized matrix multiplication.

System Requirements

Category Requirement Notes
OS Linux or Windows Self-hosted CI uses Ubuntu 22.04
GPU NVIDIA GPU (CC >= 50) Maxwell or newer
VRAM Depends on model size Larger models require more VRAM
Build System CMake 3.18+ Required for CMAKE_CUDA_ARCHITECTURES

Dependencies

System Packages

  • NVIDIA CUDA Toolkit >= 10.0
    • CC 86 (RTX 3000): requires CUDA >= 11.1
    • CC 89 (RTX 4000): requires CUDA >= 11.8
    • CC 120 (Blackwell): requires CUDA >= 12.8
    • CC 121 (Rubin): requires CUDA >= 12.9
  • NVIDIA GPU driver compatible with toolkit version

Optional

  • CCCL 3.2 (NVIDIA C++ Core Libraries) for latest features
  • CUDA 12.8+ for compression mode support

Credentials

No credentials are required.

Quick Install

# Install CUDA Toolkit (Ubuntu)
sudo apt-get install nvidia-cuda-toolkit

# Build GGML with CUDA backend
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

Code Evidence

Architecture selection from `src/ggml-cuda/CMakeLists.txt:8-55`:

if (NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
    # native == GPUs available at build time
    # 50     == Maxwell, lowest CUDA 12 standard
    # 60     == P100, FP16 CUDA intrinsics
    # 61     == Pascal, __dp4a instruction (per-byte integer dot product)
    # 70     == V100, FP16 tensor cores
    # 75     == Turing, int8 tensor cores
    # 80     == Ampere, asynchronous data loading, faster tensor core instructions
    # 86     == RTX 3000, needs CUDA v11.1
    # 89     == RTX 4000, needs CUDA v11.8
    # 120    == Blackwell, needs CUDA v12.8, FP4 tensor cores

CMake minimum version for CUDA from `src/ggml-cuda/CMakeLists.txt:1`:

cmake_minimum_required(VERSION 3.18)  # for CMAKE_CUDA_ARCHITECTURES

Blackwell architecture handling from `src/ggml-cuda/CMakeLists.txt:39-51`:

if (CUDAToolkit_VERSION VERSION_GREATER_EQUAL "12.8")
    # The CUDA architecture 120f-virtual would in principle work for Blackwell
    # but the newly added "f" suffix conflicted with a preexisting regex
    # for validating CUDA architectures in CMake.
    list(APPEND CMAKE_CUDA_ARCHITECTURES 120a-real)
endif()

Common Errors

Error Message Cause Solution
`CUDAToolkit not found` CUDA Toolkit not installed Install CUDA Toolkit from NVIDIA developer site
`Unsupported gpu architecture 'compute_XX'` CUDA version too old for target GPU Upgrade CUDA Toolkit to version supporting your GPU architecture
`CMake 3.18 or higher is required` CMake too old for CUDA support Upgrade CMake to 3.18+
`CUDA out of memory` Model too large for available VRAM Use a smaller model or quantized model format

Compatibility Notes

  • AMD GPUs (ROCm/HIP): The same backend code compiles for AMD GPUs via the HIP compatibility layer (`src/ggml-cuda/vendors/hip.h`). Build with `-DGGML_HIP=ON`.
  • Moore Threads (MUSA): Supported via MUSA compatibility header (`src/ggml-cuda/vendors/musa.h`). Build with `-DGGML_MUSA=ON`.
  • Blackwell (CC 120): Requires CMake >= 3.31.8 or >= 4.0.2 for the `120f-virtual` architecture suffix; alternatively use `120a-real` which works with any CMake version.
  • Native builds: Use `GGML_NATIVE=ON` with CUDA >= 11.6 and CMake >= 3.24 to auto-detect installed GPU architecture.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment