Implementation:Ggml org Ggml Cpu kleidiai kernels

Metadata

Field	Value
Page Type	Implementation (KleidiAI Kernel Wrappers)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, CPU_Backend, Quantized_Matrix_Multiplication
Last Updated	2025-05-15 12:00 GMT

Overview

Wraps Arm KleidiAI micro-kernel functions into a unified interface for kernel selection and dispatch in GGML's quantized matrix multiplication.

Description

kleidiai/kernels.cpp serves as the adapter layer between Arm's KleidiAI optimized micro-kernels and GGML's kernel selection system. It provides:

Template function wrappers: Adapts KleidiAI micro-kernel functions with varying parameter counts into uniform function pointer signatures:
- kernel_run_fn11 / kernel_run_fn10 / kernel_run_float_fn10 -- Wraps matrix multiplication kernels.
- kernel_offs_fn3 / kernel_offs_fn2 -- Wraps offset calculation functions.
- lhs_ps_fn6 / lhs_ps_fn5 -- Wraps LHS packed size calculation.
- lhs_pack_float_fn10 / lhs_pack_void_fn10 -- Wraps LHS quantization and packing.
- rhs_ps_fn5 / rhs_ps_fn2 / rhs_pack_fn12 -- Wraps RHS repacking.
KleidiAI kernel variants: Includes headers for specific instruction set targets:
- NEON dotprod: 1x4, 4x4 tile sizes.
- NEON I8MM: 4x8 tile sizes.
- SVE dotprod/I8MM: Variable-length vector processing.
- SME2 mopa/sdot: Streaming Matrix Extensions for Arm.
- BF16 SME2: Brain float16 matrix operations.
LHS/RHS packing: Wraps KleidiAI's LHS quantization packing (kai_lhs_quant_pack_qsi8d32p_f32) and RHS repacking (kai_rhs_pack_nxk_qsi4c32pscalef16_qsu4c32s16s0) into uniform interfaces.

Usage

This file is used internally by kleidiai.cpp to populate ggml_kleidiai_kernels structures. It is not called directly by user code.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/kleidiai/kernels.cpp (938 lines).

Signature

// Template wrapper examples (used to create uniform function pointers):
template<void(*Fn)(size_t,size_t,size_t,size_t,
    const void*,const void*,float*,size_t,size_t,float,float)>
static inline void kernel_run_fn11(
    size_t m, size_t n, size_t k, size_t bl,
    const void* lhs, const void* rhs, void* dst,
    size_t dst_stride_row, size_t dst_stride_col,
    float clamp_min, float clamp_max);

// Kernel selection functions (defined in kernels.h, populated here):
ggml_kleidiai_kernels * ggml_kleidiai_select_kernels_q4_0(cpu_feature features);
ggml_kleidiai_kernels * ggml_kleidiai_select_kernels_q8_0(cpu_feature features);

Import

#include "kleidiai/kernels.h"

I/O Contract

Inputs

Parameter	Type	Required	Description
`features`	`cpu_feature`	Yes	Bitmask of detected CPU features (DOTPROD, I8MM, SVE, SME).
`m, n, k`	`size_t`	Yes (kernel)	Matrix dimensions for the GEMM/GEMV operation.
`lhs, rhs`	`const void *`	Yes (kernel)	Packed LHS (activations) and RHS (weights) data.

Outputs

Output	Type	Description
`ggml_kleidiai_kernels *`	Pointer	Selected kernel set matching the CPU feature flags, or `NULL` if no suitable kernel exists.
`dst`	`float *`	Matrix multiplication result (for kernel run functions).

Usage Examples

Kernel Selection (Internal)

#include "kleidiai/kernels.h"

// During initialization, select optimal kernels:
cpu_feature features = CPU_FEATURE_DOTPROD | CPU_FEATURE_I8MM;
ggml_kleidiai_kernels * kernels = ggml_kleidiai_select_kernels_q4_0(features);

if (kernels) {
    // Use kernels->gemm for matrix multiply
    // Use kernels->gemv for vector-matrix multiply
}

Related Pages

Ggml_org_Ggml_Cpu_kleidiai_backend -- The backend integration that uses these kernel wrappers.
Ggml_org_Ggml_Cpu_backend_interface -- Registers KleidiAI as an extra buffer type.
Ggml_org_Ggml_Cpu_amx_mmq -- Intel AMX: analogous accelerated matmul for x86.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment