Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Cpu x86 cpu feats

From Leeroopedia


Metadata

Field Value
Page Type Implementation (Architecture-Specific SIMD)
Knowledge Sources GGML
Domains ML_Infrastructure, CPU_Feature_Detection, SIMD_Optimization
Last Updated 2025-05-15 12:00 GMT

Overview

Runtime x86-64 CPU feature detection via CPUID and compatibility scoring for dynamic backend selection on Intel and AMD processors.

Description

arch/x86/cpu-feats.cpp provides the most detailed CPU feature detection in the GGML codebase, enabling fine-grained runtime selection among x86 backend variants spanning from basic SSE to AVX-512 and AMX acceleration.

The file defines a cpuid_x86 struct that executes CPUID instructions to populate bitset fields for a comprehensive set of x86 ISA extensions:

SSE family: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2

AVX family: AVX, AVX2, FMA, F16C, AVX-VNNI

AVX-512 family: AVX512F, AVX512CD, AVX512BW, AVX512VL, AVX512DQ, AVX512PF, AVX512ER, AVX512_VBMI, AVX512_VNNI, AVX512_FP16, AVX512_BF16

AMX family: AMX_TILE, AMX_INT8, AMX_FP16, AMX_BF16

Other: PCLMULQDQ, POPCNT, AES, BMI1, BMI2, LZCNT, RDRAND, RDSEED, SHA, and AMD-specific extensions (SSE4a, XOP, TBM, ABM, 3DNow!)

The CPUID execution uses __cpuid/__cpuidex on MSVC or inline assembly (cpuid instruction) on GCC/Clang. The struct identifies the vendor string to differentiate Intel-specific and AMD-specific features.

The ggml_backend_cpu_x86_score function implements a compatibility scoring system: it checks each compile-time feature flag (e.g., GGML_AVX2, GGML_AVX512) against runtime detection results. If any required feature is missing, it returns 0 (incompatible). Otherwise, it returns a cumulative bit-shifted score where higher-tier features contribute more weight, allowing the dynamic loader to select the most capable compatible backend variant.

The score is exported via the GGML_BACKEND_DL_SCORE_IMPL macro for use by the backend dynamic loading system.

Usage

This file is compiled into each x86 CPU backend variant (one per SIMD tier). At runtime, the dynamic backend loader calls the score function for each available variant and selects the one with the highest score that is compatible with the host CPU.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/arch/x86/cpu-feats.cpp (327 lines).

Key Signatures

struct cpuid_x86 {
    bool SSE3(void);
    bool AVX(void);
    bool AVX2(void);
    bool AVX512F(void);
    bool AVX512BW(void);
    bool AVX512VL(void);
    bool AVX512_VNNI(void);
    bool AMX_INT8(void);
    // ... and many more
};

static int ggml_backend_cpu_x86_score();

GGML_BACKEND_DL_SCORE_IMPL(ggml_backend_cpu_x86_score)

Import

#include "ggml-backend-impl.h"
#include <cstring>
#include <vector>
#include <bitset>
#include <array>
#include <string>

I/O Contract

Inputs

Parameter Type Description
(none) -- The score function takes no parameters. It reads CPU feature bits directly via the CPUID instruction and checks them against compile-time flags.

Outputs

Output Type Description
Score int Returns 0 if the compiled backend variant requires features not present on the host CPU. Returns a positive integer score (cumulative bit-shifted value) indicating the capability level of the backend variant. Higher scores indicate more advanced SIMD support.

Score Weights

Feature Flag Score Contribution
GGML_FMA +1
GGML_F16C +2
GGML_SSE42 +4
GGML_BMI2 +8
GGML_AVX +16
GGML_AVX2 +32
GGML_AVX_VNNI +64
GGML_AVX512 +128
GGML_AVX512_VBMI +256
GGML_AVX512_BF16 +512
GGML_AVX512_VNNI +1024
GGML_AMX_INT8 +2048

Usage Examples

// The score function is called internally by the GGML backend loader.
// It is not typically called directly by user code.
//
// The macro GGML_BACKEND_DL_SCORE_IMPL exports the function as:
//   extern "C" int ggml_backend_score(void);
//
// The backend loader enumerates all .so/.dll variants and calls each
// one's score function to select the best match for the host CPU.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment