Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Cpu kleidiai backend

From Leeroopedia


Metadata

Field Value
Page Type Implementation (KleidiAI Backend Integration)
Knowledge Sources GGML
Domains ML_Infrastructure, Tensor_Computing, CPU_Backend, Quantized_Matrix_Multiplication
Last Updated 2025-05-15 12:00 GMT

Overview

Implements the GGML backend integration for Arm's KleidiAI library, providing optimized quantized matrix multiplication with automatic CPU feature detection and kernel selection.

Description

kleidiai/kleidiai.cpp is the main integration point that makes Arm KleidiAI's highly-optimized micro-kernels available as a GGML CPU backend. Key components include:

  1. Context singleton: ggml_kleidiai_context holds detected CPU features and selected q4/q8 kernel sets.
  2. CPU feature detection: init_kleidiai_context() runs once under a critical section, detecting:
    • DOTPROD (dot product instructions)
    • I8MM (int8 matrix multiply)
    • SVE (Scalable Vector Extension, validated that SVE count matches QK8_0 = 32)
    • SME (Streaming Matrix Extensions, opt-in via GGML_KLEIDIAI_SME environment variable)
  3. Kernel selection: Calls ggml_kleidiai_select_kernels_q4_0 and ggml_kleidiai_select_kernels_q8_0 to find the best kernel for the detected features.
  4. Tensor traits: Implements ggml::cpu::kleidiai::tensor_traits for:
    • Work size: Calculates LHS packing buffer size per operation.
    • Compute: Performs tiled GEMM/GEMV with LHS quantization packing, RHS access from pre-packed buffers, and multi-threaded dispatch.
  5. Extra buffer type: Provides ggml_backend_cpu_kleidiai_buffer_type() that intercepts GGML_OP_MUL_MAT on appropriately typed tensors, packs RHS weights into KleidiAI format, and dispatches to optimized kernels.
  6. Data layout helpers: transpose_f32kxn_f16nxk for transposing f16 matrices to f32 for LHS packing.

Usage

KleidiAI acceleration is activated automatically on ARM CPUs with dotprod/I8MM/SVE/SME support when the build includes GGML_USE_CPU_KLEIDIAI. The backend registers itself as an extra buffer type.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/kleidiai/kleidiai.cpp (798 lines).

Signature

// Backend buffer type registration
ggml_backend_buffer_type_t ggml_backend_cpu_kleidiai_buffer_type(void);

// Kernel selection (used internally)
ggml_kleidiai_kernels * ggml_kleidiai_select_kernels(
    cpu_feature features, const struct ggml_tensor * op);

Import

#include "kleidiai/kleidiai.h"

I/O Contract

Inputs

Parameter Type Required Description
CPU features Hardware detection Automatic Detected at init via ggml_cpu_has_dotprod(), ggml_cpu_has_matmul_int8(), ggml_cpu_has_sve(), ggml_cpu_has_sme().
GGML_KLEIDIAI_SME Environment variable No Set to non-zero to enable SME kernels (opt-in due to potential stability considerations).
op const struct ggml_tensor * Yes (select) The mul_mat operation tensor for kernel selection based on weight type.

Outputs

Output Type Description
Buffer type ggml_backend_buffer_type_t KleidiAI buffer type for the CPU backend, or NULL if no suitable kernels are available.
Matrix result float * Output of the optimized quantized matrix multiplication.

Usage Examples

Automatic KleidiAI Activation

#include "ggml-cpu.h"
#include "ggml-backend.h"

// KleidiAI is automatically enabled when building with GGML_USE_CPU_KLEIDIAI
// and running on a supported ARM processor.

// Create CPU backend (KleidiAI buffer type is auto-registered)
ggml_backend_t cpu = ggml_backend_cpu_init();

// Tensors using q4_0 or q8_0 quantization will automatically
// use KleidiAI-optimized matrix multiplication when:
// 1. The CPU supports dotprod, I8MM, SVE, or SME
// 2. The weight tensor is allocated via the KleidiAI buffer type

Enabling SME Kernels

// To enable SME (Streaming Matrix Extensions) kernels:
// Set environment variable before running:
// export GGML_KLEIDIAI_SME=1

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment