Implementation:Ggml org Ggml Cpu riscv repack

Metadata

Field	Value
Page Type	Implementation (Architecture-Specific SIMD)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, SIMD_Optimization
Last Updated	2025-05-15 12:00 GMT

Overview

RISC-V Vector (RVV) optimized GEMV and GEMM kernels for interleaved quantized matrix formats on RISC-V processors with the V extension.

Description

arch/riscv/repack.cpp provides RISC-V Vector extension implementations of quantized matrix-vector (GEMV) and matrix-matrix (GEMM) multiplication kernels operating on interleaved block formats. This is the smallest of the architecture-specific repack files at 342 lines, reflecting the early stage of RISC-V optimization in the GGML codebase.

The file implements two key functions:

ggml_gemv_q4_0_8x8_q8_0 performs matrix-vector multiplication where weights are stored in 8-column interleaved q4_0 blocks and activations are in q8_0 format. The kernel:

Unpacks 4-bit weights from interleaved blocks using shift and sign-extension operations
Performs widening multiply-accumulate (__riscv_vwmul_vv, __riscv_vwmacc_vv) between int8 weights and activations
Reduces partial sums across vector lanes using narrowing shift-and-add operations (__riscv_vnsrl_wx, __riscv_vadd_vv)
Converts to float and applies scale factors from both weight and activation blocks

ggml_gemm_q4_0_8x8_q8_0 performs the equivalent matrix-matrix multiplication for batched operations.

The kernel uses an inline assembly barrier (__asm__ __volatile__("" ::: "memory")) to prevent GCC from emitting fused vlse64 instructions that would violate alignment constraints. The implementation requires __riscv_vlenb() >= QK4_0 (vector register length at least 32 bytes).

Usage

This file is compiled as part of the GGML CPU backend when targeting RISC-V platforms with the V extension. The GEMV/GEMM kernels are invoked during inference when the scheduler selects the interleaved matrix multiplication path.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/arch/riscv/repack.cpp (342 lines).

Key Signatures

void ggml_gemv_q4_0_8x8_q8_0(int n, float * GGML_RESTRICT s, size_t bs,
    const void * GGML_RESTRICT vx, const void * GGML_RESTRICT vy, int nr, int nc);
void ggml_gemm_q4_0_8x8_q8_0(int n, float * GGML_RESTRICT s, size_t bs,
    const void * GGML_RESTRICT vx, const void * GGML_RESTRICT vy, int nr, int nc);

Import

#include "ggml-common.h"
#include "ggml-backend-impl.h"
#include "ggml-cpu.h"
#include "../../repack.h"

I/O Contract

Inputs

Parameter	Type	Description
`n`	`int`	Inner dimension size (number of elements per dot product). Must be a multiple of `QK8_0` (32).
`vx`	`const void *`	Pointer to the interleaved quantized weight matrix (8-column interleaved `block_q4_0x8` blocks).
`vy`	`const void *`	Pointer to the quantized activation vector/matrix (`block_q8_0` blocks).
`nr`	`int`	Number of rows in the output.
`nc`	`int`	Number of columns in the output. Must be a multiple of 8 (the interleave factor).

Outputs

Output	Type	Description
`s`	`float *`	Destination buffer for the floating-point result. For GEMV, contains `nc` results; for GEMM, contains `nr * nc` results.

Usage Examples

// Perform GEMV with interleaved q4_0 weights on RISC-V
float output[8];
ggml_gemv_q4_0_8x8_q8_0(256, output, sizeof(float),
    interleaved_weights_q4_0x8, activations_q8_0, 1, 8);

Related Pages

Principle:Ggml_org_Ggml_Architecture_Specific_SIMD_Quantization
Implementation:Ggml_org_Ggml_Cpu_riscv_quants -- RISC-V RVV quantization and dot product routines
Implementation:Ggml_org_Ggml_Cpu_arm_repack -- ARM NEON equivalent
Implementation:Ggml_org_Ggml_Cpu_x86_repack -- x86 AVX/AVX-512 equivalent

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment