Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Ggml org Ggml CPU Feature Detection

From Leeroopedia


Knowledge Sources
Domains CPU, Runtime_Detection
Last Updated 2026-02-10

Overview

CPU Feature Detection is the runtime identification of CPU instruction set extensions to select the most capable CPU backend variant for the current hardware.

Description

Modern x86 processors support a wide spectrum of instruction set extensions -- SSE4.2, AVX, AVX2, FMA, F16C, AVX-512 (with sub-features like VNNI, BF16, VBMI, FP16), and AMX (INT8, FP16, BF16). ARM processors similarly offer NEON, SVE, and i8mm. GGML compiles multiple variants of the CPU backend, each targeting a different feature level, and uses CPU Feature Detection at runtime to select the variant that matches the current processor's capabilities.

On x86, the detection is implemented through the CPUID instruction. The cpuid_x86 struct in src/ggml-cpu/arch/x86/cpu-feats.cpp queries CPUID leaves 0, 1, 7 (with sub-leaves), and extended leaves 0x80000001 through 0x80000004 to populate bitsets representing every relevant feature flag. Each flag is exposed as a named boolean method (e.g., AVX2(), AVX512F(), AMX_INT8()).

The scoring function ggml_backend_cpu_x86_score translates feature detection into backend selection. It checks each feature that the current build variant was compiled with (gated by compile-time macros like GGML_AVX2, GGML_AVX512, GGML_AMX_INT8) against the runtime feature set. If any required feature is absent, the function returns 0 (incompatible). Otherwise, it accumulates a score where higher-capability features contribute exponentially larger values (using bit shifts). The dynamic loader then selects the variant with the highest score, ensuring the most optimized code path is used.

Equivalent detection logic exists for other architectures: ARM uses processor feature registers or OS-provided capability queries, RISC-V checks extension support, PowerPC queries hardware capabilities, and s390 uses facility-list inspection.

Usage

Apply this principle when deploying GGML on diverse hardware where the exact CPU model is not known at compile time. The detection is automatic: when GGML loads CPU backend variants as dynamic libraries, each variant's score function runs, and the loader picks the highest-scoring compatible variant. Developers adding support for new instruction set extensions should add the corresponding CPUID checks to the architecture-specific cpu-feats.cpp file and update the scoring function to include the new feature in the score calculation.

Theoretical Basis

CPU Feature Detection in GGML follows a compile-time specialization with runtime selection strategy:

  1. CPUID-Based Hardware Interrogation -- On x86, the CPUID instruction is the canonical mechanism for querying processor capabilities. It takes a leaf number (function ID) in the EAX register and returns feature flags in EAX, EBX, ECX, and EDX. GGML queries leaf 0 for vendor identification (Intel vs. AMD, which affects the interpretation of some flags), leaf 1 for baseline features (SSE, AVX), leaf 7 for extended features (AVX2, AVX-512, AMX), and extended leaves for vendor-specific features. The results are stored in bitsets for efficient flag testing.
  2. Multi-Variant Compilation -- The GGML build system compiles the CPU backend multiple times with different -march / -m compiler flags, producing separate shared libraries for each feature level (e.g., one with AVX2, one with AVX-512, one with AMX). Each variant contains the same algorithmic logic but uses different SIMD intrinsics in performance-critical kernels such as quantized matrix multiplication.
  3. Score-Based Selection -- Rather than a simple "supports / does not support" check, the scoring system assigns progressively higher weights to more advanced features using bit-shifted increments. This creates a total ordering over backend variants: a variant compiled with AVX-512 + VNNI will always score higher than one with only AVX2, which in turn scores higher than baseline AVX. The exponential weighting ensures that the presence of a high-end feature always dominates the score, even if a lower-end variant has more minor features enabled.
  4. Compatibility Guard -- If the runtime CPU lacks any feature that a variant was compiled to require, the score function returns 0, completely excluding that variant from selection. This prevents illegal-instruction crashes that would occur if, for example, an AVX-512 binary ran on a CPU without AVX-512 support.
  5. Architecture Abstraction -- The same pattern (detect capabilities, compute score, register via GGML_BACKEND_DL_SCORE_IMPL) is replicated across x86, ARM, RISC-V, PowerPC, and s390 architectures, each with architecture-appropriate detection mechanisms. This makes the CPU feature detection system a cross-platform principle rather than an x86-specific implementation detail.

Related Pages

Implemented By

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment