Implementation:Ggml org Ggml Cpu riscv repack
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (Architecture-Specific SIMD) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, SIMD_Optimization |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
RISC-V Vector (RVV) optimized GEMV and GEMM kernels for interleaved quantized matrix formats on RISC-V processors with the V extension.
Description
arch/riscv/repack.cpp provides RISC-V Vector extension implementations of quantized matrix-vector (GEMV) and matrix-matrix (GEMM) multiplication kernels operating on interleaved block formats. This is the smallest of the architecture-specific repack files at 342 lines, reflecting the early stage of RISC-V optimization in the GGML codebase.
The file implements two key functions:
ggml_gemv_q4_0_8x8_q8_0 performs matrix-vector multiplication where weights are stored in 8-column interleaved q4_0 blocks and activations are in q8_0 format. The kernel:
- Unpacks 4-bit weights from interleaved blocks using shift and sign-extension operations
- Performs widening multiply-accumulate (
__riscv_vwmul_vv,__riscv_vwmacc_vv) between int8 weights and activations - Reduces partial sums across vector lanes using narrowing shift-and-add operations (
__riscv_vnsrl_wx,__riscv_vadd_vv) - Converts to float and applies scale factors from both weight and activation blocks
ggml_gemm_q4_0_8x8_q8_0 performs the equivalent matrix-matrix multiplication for batched operations.
The kernel uses an inline assembly barrier (__asm__ __volatile__("" ::: "memory")) to prevent GCC from emitting fused vlse64 instructions that would violate alignment constraints. The implementation requires __riscv_vlenb() >= QK4_0 (vector register length at least 32 bytes).
Usage
This file is compiled as part of the GGML CPU backend when targeting RISC-V platforms with the V extension. The GEMV/GEMM kernels are invoked during inference when the scheduler selects the interleaved matrix multiplication path.
Code Reference
Source Location
GGML repo, file: src/ggml-cpu/arch/riscv/repack.cpp (342 lines).
Key Signatures
void ggml_gemv_q4_0_8x8_q8_0(int n, float * GGML_RESTRICT s, size_t bs,
const void * GGML_RESTRICT vx, const void * GGML_RESTRICT vy, int nr, int nc);
void ggml_gemm_q4_0_8x8_q8_0(int n, float * GGML_RESTRICT s, size_t bs,
const void * GGML_RESTRICT vx, const void * GGML_RESTRICT vy, int nr, int nc);
Import
#include "ggml-common.h"
#include "ggml-backend-impl.h"
#include "ggml-cpu.h"
#include "../../repack.h"
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
n |
int |
Inner dimension size (number of elements per dot product). Must be a multiple of QK8_0 (32).
|
vx |
const void * |
Pointer to the interleaved quantized weight matrix (8-column interleaved block_q4_0x8 blocks).
|
vy |
const void * |
Pointer to the quantized activation vector/matrix (block_q8_0 blocks).
|
nr |
int |
Number of rows in the output. |
nc |
int |
Number of columns in the output. Must be a multiple of 8 (the interleave factor). |
Outputs
| Output | Type | Description |
|---|---|---|
s |
float * |
Destination buffer for the floating-point result. For GEMV, contains nc results; for GEMM, contains nr * nc results.
|
Usage Examples
// Perform GEMV with interleaved q4_0 weights on RISC-V
float output[8];
ggml_gemv_q4_0_8x8_q8_0(256, output, sizeof(float),
interleaved_weights_q4_0x8, activations_q8_0, 1, 8);
Related Pages
- Principle:Ggml_org_Ggml_Architecture_Specific_SIMD_Quantization
- Implementation:Ggml_org_Ggml_Cpu_riscv_quants -- RISC-V RVV quantization and dot product routines
- Implementation:Ggml_org_Ggml_Cpu_arm_repack -- ARM NEON equivalent
- Implementation:Ggml_org_Ggml_Cpu_x86_repack -- x86 AVX/AVX-512 equivalent