Implementation:Ggml org Ggml Cpu spacemit ime

Metadata

Field	Value
Page Type	Implementation (SpacemiT IME Backend)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, CPU_Backend, Quantized_Matrix_Multiplication
Last Updated	2025-05-15 12:00 GMT

Overview

Implements the GGML backend integration for SpacemiT IME (Inference Matrix Engine) on RISC-V processors, providing hardware-accelerated quantized matrix multiplication.

Description

spacemit/ime.cpp enables hardware-accelerated quantized inference on SpacemiT RISC-V processors (e.g., SpacemiT K1/X60) that include a dedicated Inference Matrix Engine. Key components include:

Build requirements: Requires RISC-V V extension (__riscv_v), Zfh extension (__riscv_zfh), and the RISCV64_SPACEMIT_IME1 build flag. Compilation fails with descriptive errors if requirements are unmet.
GEMM arguments: qnbitgemm_spacemit_ime_args carries matrix pointers (a_ptr, packed_quant_b_data), strides (lda, ldc), quantization scales and zero points, bias, and output buffer.
Int8-by-Int4 GEMM: sqnbitgemm_spacemit_ime_i8i4 performs the int8 activation by int4 weight quantized GEMM with block-wise scaling. It dispatches to sqnbitgemm_spacemit_ime::ime1::gemm_kernel_i8i4 for the hardware-accelerated inner kernel.
Weight packing: Interleaved block layout for efficient IME access, with block<K, N> template structs storing N quantization blocks with grouped deltas and packed quants.
Tensor traits: Implements custom tensor_traits_base and tensor_traits_common for work size calculation and compute dispatch.
Extra buffer type: Provides ggml_backend_cpu_riscv64_spacemit_buffer_type() for registering SpacemiT IME as an accelerated backend.
AI core detection: Uses std::thread::hardware_concurrency() / 2 to estimate the number of AI cores available.

Usage

SpacemiT IME acceleration is activated automatically on supported RISC-V hardware when the build includes GGML_USE_CPU_RISCV64_SPACEMIT. The backend registers itself as an extra buffer type.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/spacemit/ime.cpp (1025 lines).

Signature

// Backend buffer type registration
ggml_backend_buffer_type_t ggml_backend_cpu_riscv64_spacemit_buffer_type(void);

// Internal GEMM function
static void sqnbitgemm_spacemit_ime_i8i4(
    const size_t blk_len,
    const size_t gemm_k,
    const qnbitgemm_spacemit_ime_args * gemm_args,
    void * const per_gemm_ws,
    const size_t m_start, const size_t m_count,
    const size_t n_start, const size_t n_count);

Import

#include "spacemit/ime.h"

I/O Contract

Inputs

Parameter	Type	Required	Description
`blk_len`	`size_t`	Yes	Quantization block length (e.g., 32 for q4_0).
`gemm_k`	`size_t`	Yes	Inner dimension of the matrix multiplication.
`gemm_args`	`const qnbitgemm_spacemit_ime_args *`	Yes	Matrix pointers, strides, quantization parameters.
`per_gemm_ws`	`void *`	Yes	Per-GEMM workspace buffer for quantized activations.
`m_start, m_count, n_start, n_count`	`size_t`	Yes	Tile coordinates for the current thread's work partition.

Outputs

Output	Type	Description
`gemm_args->c_ptr`	`float *`	Matrix multiplication result in f32 format.
Buffer type	`ggml_backend_buffer_type_t`	SpacemiT buffer type, or `NULL` if hardware is unsupported.

Usage Examples

Automatic SpacemiT IME Activation

#include "ggml-cpu.h"

// SpacemiT IME is automatically enabled when building with
// GGML_USE_CPU_RISCV64_SPACEMIT and running on SpacemiT K1/X60 hardware.

// The CPU backend auto-registers the SpacemiT buffer type:
ggml_backend_t cpu = ggml_backend_cpu_init();

// Quantized tensors (q4_0, q8_0) will automatically use IME-accelerated
// matrix multiplication when weights are allocated through the SpacemiT buffer.

Related Pages

Ggml_org_Ggml_Cpu_spacemit_ime1_kernels -- Low-level IME1 assembly kernels called by this backend.
Ggml_org_Ggml_Cpu_backend_interface -- Registers SpacemiT as an extra buffer type.
Ggml_org_Ggml_Cpu_kleidiai_backend -- ARM KleidiAI: analogous accelerated matmul for ARM.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment