Implementation:Ggml org Ggml Cpu kleidiai kernels
Appearance
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (KleidiAI Kernel Wrappers) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, CPU_Backend, Quantized_Matrix_Multiplication |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
Wraps Arm KleidiAI micro-kernel functions into a unified interface for kernel selection and dispatch in GGML's quantized matrix multiplication.
Description
kleidiai/kernels.cpp serves as the adapter layer between Arm's KleidiAI optimized micro-kernels and GGML's kernel selection system. It provides:
- Template function wrappers: Adapts KleidiAI micro-kernel functions with varying parameter counts into uniform function pointer signatures:
kernel_run_fn11/kernel_run_fn10/kernel_run_float_fn10-- Wraps matrix multiplication kernels.kernel_offs_fn3/kernel_offs_fn2-- Wraps offset calculation functions.lhs_ps_fn6/lhs_ps_fn5-- Wraps LHS packed size calculation.lhs_pack_float_fn10/lhs_pack_void_fn10-- Wraps LHS quantization and packing.rhs_ps_fn5/rhs_ps_fn2/rhs_pack_fn12-- Wraps RHS repacking.
- KleidiAI kernel variants: Includes headers for specific instruction set targets:
- NEON dotprod: 1x4, 4x4 tile sizes.
- NEON I8MM: 4x8 tile sizes.
- SVE dotprod/I8MM: Variable-length vector processing.
- SME2 mopa/sdot: Streaming Matrix Extensions for Arm.
- BF16 SME2: Brain float16 matrix operations.
- LHS/RHS packing: Wraps KleidiAI's LHS quantization packing (
kai_lhs_quant_pack_qsi8d32p_f32) and RHS repacking (kai_rhs_pack_nxk_qsi4c32pscalef16_qsu4c32s16s0) into uniform interfaces.
Usage
This file is used internally by kleidiai.cpp to populate ggml_kleidiai_kernels structures. It is not called directly by user code.
Code Reference
Source Location
GGML repo, file: src/ggml-cpu/kleidiai/kernels.cpp (938 lines).
Signature
// Template wrapper examples (used to create uniform function pointers):
template<void(*Fn)(size_t,size_t,size_t,size_t,
const void*,const void*,float*,size_t,size_t,float,float)>
static inline void kernel_run_fn11(
size_t m, size_t n, size_t k, size_t bl,
const void* lhs, const void* rhs, void* dst,
size_t dst_stride_row, size_t dst_stride_col,
float clamp_min, float clamp_max);
// Kernel selection functions (defined in kernels.h, populated here):
ggml_kleidiai_kernels * ggml_kleidiai_select_kernels_q4_0(cpu_feature features);
ggml_kleidiai_kernels * ggml_kleidiai_select_kernels_q8_0(cpu_feature features);
Import
#include "kleidiai/kernels.h"
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
features |
cpu_feature |
Yes | Bitmask of detected CPU features (DOTPROD, I8MM, SVE, SME). |
m, n, k |
size_t |
Yes (kernel) | Matrix dimensions for the GEMM/GEMV operation. |
lhs, rhs |
const void * |
Yes (kernel) | Packed LHS (activations) and RHS (weights) data. |
Outputs
| Output | Type | Description |
|---|---|---|
ggml_kleidiai_kernels * |
Pointer | Selected kernel set matching the CPU feature flags, or NULL if no suitable kernel exists.
|
dst |
float * |
Matrix multiplication result (for kernel run functions). |
Usage Examples
Kernel Selection (Internal)
#include "kleidiai/kernels.h"
// During initialization, select optimal kernels:
cpu_feature features = CPU_FEATURE_DOTPROD | CPU_FEATURE_I8MM;
ggml_kleidiai_kernels * kernels = ggml_kleidiai_select_kernels_q4_0(features);
if (kernels) {
// Use kernels->gemm for matrix multiply
// Use kernels->gemv for vector-matrix multiply
}
Related Pages
- Ggml_org_Ggml_Cpu_kleidiai_backend -- The backend integration that uses these kernel wrappers.
- Ggml_org_Ggml_Cpu_backend_interface -- Registers KleidiAI as an extra buffer type.
- Ggml_org_Ggml_Cpu_amx_mmq -- Intel AMX: analogous accelerated matmul for x86.
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment