Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Cpu riscv repack

From Leeroopedia


Metadata

Field Value
Page Type Implementation (Architecture-Specific SIMD)
Knowledge Sources GGML
Domains ML_Infrastructure, Tensor_Computing, SIMD_Optimization
Last Updated 2025-05-15 12:00 GMT

Overview

RISC-V Vector (RVV) optimized GEMV and GEMM kernels for interleaved quantized matrix formats on RISC-V processors with the V extension.

Description

arch/riscv/repack.cpp provides RISC-V Vector extension implementations of quantized matrix-vector (GEMV) and matrix-matrix (GEMM) multiplication kernels operating on interleaved block formats. This is the smallest of the architecture-specific repack files at 342 lines, reflecting the early stage of RISC-V optimization in the GGML codebase.

The file implements two key functions:

ggml_gemv_q4_0_8x8_q8_0 performs matrix-vector multiplication where weights are stored in 8-column interleaved q4_0 blocks and activations are in q8_0 format. The kernel:

  1. Unpacks 4-bit weights from interleaved blocks using shift and sign-extension operations
  2. Performs widening multiply-accumulate (__riscv_vwmul_vv, __riscv_vwmacc_vv) between int8 weights and activations
  3. Reduces partial sums across vector lanes using narrowing shift-and-add operations (__riscv_vnsrl_wx, __riscv_vadd_vv)
  4. Converts to float and applies scale factors from both weight and activation blocks

ggml_gemm_q4_0_8x8_q8_0 performs the equivalent matrix-matrix multiplication for batched operations.

The kernel uses an inline assembly barrier (__asm__ __volatile__("" ::: "memory")) to prevent GCC from emitting fused vlse64 instructions that would violate alignment constraints. The implementation requires __riscv_vlenb() >= QK4_0 (vector register length at least 32 bytes).

Usage

This file is compiled as part of the GGML CPU backend when targeting RISC-V platforms with the V extension. The GEMV/GEMM kernels are invoked during inference when the scheduler selects the interleaved matrix multiplication path.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/arch/riscv/repack.cpp (342 lines).

Key Signatures

void ggml_gemv_q4_0_8x8_q8_0(int n, float * GGML_RESTRICT s, size_t bs,
    const void * GGML_RESTRICT vx, const void * GGML_RESTRICT vy, int nr, int nc);
void ggml_gemm_q4_0_8x8_q8_0(int n, float * GGML_RESTRICT s, size_t bs,
    const void * GGML_RESTRICT vx, const void * GGML_RESTRICT vy, int nr, int nc);

Import

#include "ggml-common.h"
#include "ggml-backend-impl.h"
#include "ggml-cpu.h"
#include "../../repack.h"

I/O Contract

Inputs

Parameter Type Description
n int Inner dimension size (number of elements per dot product). Must be a multiple of QK8_0 (32).
vx const void * Pointer to the interleaved quantized weight matrix (8-column interleaved block_q4_0x8 blocks).
vy const void * Pointer to the quantized activation vector/matrix (block_q8_0 blocks).
nr int Number of rows in the output.
nc int Number of columns in the output. Must be a multiple of 8 (the interleave factor).

Outputs

Output Type Description
s float * Destination buffer for the floating-point result. For GEMV, contains nc results; for GEMM, contains nr * nc results.

Usage Examples

// Perform GEMV with interleaved q4_0 weights on RISC-V
float output[8];
ggml_gemv_q4_0_8x8_q8_0(256, output, sizeof(float),
    interleaved_weights_q4_0x8, activations_q8_0, 1, 8);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment