Implementation:Ggml org Ggml Cpu spacemit ime
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (SpacemiT IME Backend) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, CPU_Backend, Quantized_Matrix_Multiplication |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
Implements the GGML backend integration for SpacemiT IME (Inference Matrix Engine) on RISC-V processors, providing hardware-accelerated quantized matrix multiplication.
Description
spacemit/ime.cpp enables hardware-accelerated quantized inference on SpacemiT RISC-V processors (e.g., SpacemiT K1/X60) that include a dedicated Inference Matrix Engine. Key components include:
- Build requirements: Requires RISC-V V extension (
__riscv_v), Zfh extension (__riscv_zfh), and theRISCV64_SPACEMIT_IME1build flag. Compilation fails with descriptive errors if requirements are unmet. - GEMM arguments:
qnbitgemm_spacemit_ime_argscarries matrix pointers (a_ptr,packed_quant_b_data), strides (lda,ldc), quantization scales and zero points, bias, and output buffer. - Int8-by-Int4 GEMM:
sqnbitgemm_spacemit_ime_i8i4performs the int8 activation by int4 weight quantized GEMM with block-wise scaling. It dispatches tosqnbitgemm_spacemit_ime::ime1::gemm_kernel_i8i4for the hardware-accelerated inner kernel. - Weight packing: Interleaved block layout for efficient IME access, with
block<K, N>template structs storing N quantization blocks with grouped deltas and packed quants. - Tensor traits: Implements custom
tensor_traits_baseandtensor_traits_commonfor work size calculation and compute dispatch. - Extra buffer type: Provides
ggml_backend_cpu_riscv64_spacemit_buffer_type()for registering SpacemiT IME as an accelerated backend. - AI core detection: Uses
std::thread::hardware_concurrency() / 2to estimate the number of AI cores available.
Usage
SpacemiT IME acceleration is activated automatically on supported RISC-V hardware when the build includes GGML_USE_CPU_RISCV64_SPACEMIT. The backend registers itself as an extra buffer type.
Code Reference
Source Location
GGML repo, file: src/ggml-cpu/spacemit/ime.cpp (1025 lines).
Signature
// Backend buffer type registration
ggml_backend_buffer_type_t ggml_backend_cpu_riscv64_spacemit_buffer_type(void);
// Internal GEMM function
static void sqnbitgemm_spacemit_ime_i8i4(
const size_t blk_len,
const size_t gemm_k,
const qnbitgemm_spacemit_ime_args * gemm_args,
void * const per_gemm_ws,
const size_t m_start, const size_t m_count,
const size_t n_start, const size_t n_count);
Import
#include "spacemit/ime.h"
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
blk_len |
size_t |
Yes | Quantization block length (e.g., 32 for q4_0). |
gemm_k |
size_t |
Yes | Inner dimension of the matrix multiplication. |
gemm_args |
const qnbitgemm_spacemit_ime_args * |
Yes | Matrix pointers, strides, quantization parameters. |
per_gemm_ws |
void * |
Yes | Per-GEMM workspace buffer for quantized activations. |
m_start, m_count, n_start, n_count |
size_t |
Yes | Tile coordinates for the current thread's work partition. |
Outputs
| Output | Type | Description |
|---|---|---|
gemm_args->c_ptr |
float * |
Matrix multiplication result in f32 format. |
| Buffer type | ggml_backend_buffer_type_t |
SpacemiT buffer type, or NULL if hardware is unsupported.
|
Usage Examples
Automatic SpacemiT IME Activation
#include "ggml-cpu.h"
// SpacemiT IME is automatically enabled when building with
// GGML_USE_CPU_RISCV64_SPACEMIT and running on SpacemiT K1/X60 hardware.
// The CPU backend auto-registers the SpacemiT buffer type:
ggml_backend_t cpu = ggml_backend_cpu_init();
// Quantized tensors (q4_0, q8_0) will automatically use IME-accelerated
// matrix multiplication when weights are allocated through the SpacemiT buffer.
Related Pages
- Ggml_org_Ggml_Cpu_spacemit_ime1_kernels -- Low-level IME1 assembly kernels called by this backend.
- Ggml_org_Ggml_Cpu_backend_interface -- Registers SpacemiT as an extra buffer type.
- Ggml_org_Ggml_Cpu_kleidiai_backend -- ARM KleidiAI: analogous accelerated matmul for ARM.