Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Cpu kleidiai kernels

From Leeroopedia


Metadata

Field Value
Page Type Implementation (KleidiAI Kernel Wrappers)
Knowledge Sources GGML
Domains ML_Infrastructure, Tensor_Computing, CPU_Backend, Quantized_Matrix_Multiplication
Last Updated 2025-05-15 12:00 GMT

Overview

Wraps Arm KleidiAI micro-kernel functions into a unified interface for kernel selection and dispatch in GGML's quantized matrix multiplication.

Description

kleidiai/kernels.cpp serves as the adapter layer between Arm's KleidiAI optimized micro-kernels and GGML's kernel selection system. It provides:

  1. Template function wrappers: Adapts KleidiAI micro-kernel functions with varying parameter counts into uniform function pointer signatures:
    • kernel_run_fn11 / kernel_run_fn10 / kernel_run_float_fn10 -- Wraps matrix multiplication kernels.
    • kernel_offs_fn3 / kernel_offs_fn2 -- Wraps offset calculation functions.
    • lhs_ps_fn6 / lhs_ps_fn5 -- Wraps LHS packed size calculation.
    • lhs_pack_float_fn10 / lhs_pack_void_fn10 -- Wraps LHS quantization and packing.
    • rhs_ps_fn5 / rhs_ps_fn2 / rhs_pack_fn12 -- Wraps RHS repacking.
  2. KleidiAI kernel variants: Includes headers for specific instruction set targets:
    • NEON dotprod: 1x4, 4x4 tile sizes.
    • NEON I8MM: 4x8 tile sizes.
    • SVE dotprod/I8MM: Variable-length vector processing.
    • SME2 mopa/sdot: Streaming Matrix Extensions for Arm.
    • BF16 SME2: Brain float16 matrix operations.
  3. LHS/RHS packing: Wraps KleidiAI's LHS quantization packing (kai_lhs_quant_pack_qsi8d32p_f32) and RHS repacking (kai_rhs_pack_nxk_qsi4c32pscalef16_qsu4c32s16s0) into uniform interfaces.

Usage

This file is used internally by kleidiai.cpp to populate ggml_kleidiai_kernels structures. It is not called directly by user code.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/kleidiai/kernels.cpp (938 lines).

Signature

// Template wrapper examples (used to create uniform function pointers):
template<void(*Fn)(size_t,size_t,size_t,size_t,
    const void*,const void*,float*,size_t,size_t,float,float)>
static inline void kernel_run_fn11(
    size_t m, size_t n, size_t k, size_t bl,
    const void* lhs, const void* rhs, void* dst,
    size_t dst_stride_row, size_t dst_stride_col,
    float clamp_min, float clamp_max);

// Kernel selection functions (defined in kernels.h, populated here):
ggml_kleidiai_kernels * ggml_kleidiai_select_kernels_q4_0(cpu_feature features);
ggml_kleidiai_kernels * ggml_kleidiai_select_kernels_q8_0(cpu_feature features);

Import

#include "kleidiai/kernels.h"

I/O Contract

Inputs

Parameter Type Required Description
features cpu_feature Yes Bitmask of detected CPU features (DOTPROD, I8MM, SVE, SME).
m, n, k size_t Yes (kernel) Matrix dimensions for the GEMM/GEMV operation.
lhs, rhs const void * Yes (kernel) Packed LHS (activations) and RHS (weights) data.

Outputs

Output Type Description
ggml_kleidiai_kernels * Pointer Selected kernel set matching the CPU feature flags, or NULL if no suitable kernel exists.
dst float * Matrix multiplication result (for kernel run functions).

Usage Examples

Kernel Selection (Internal)

#include "kleidiai/kernels.h"

// During initialization, select optimal kernels:
cpu_feature features = CPU_FEATURE_DOTPROD | CPU_FEATURE_I8MM;
ggml_kleidiai_kernels * kernels = ggml_kleidiai_select_kernels_q4_0(features);

if (kernels) {
    // Use kernels->gemm for matrix multiply
    // Use kernels->gemv for vector-matrix multiply
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment