Implementation:Ggml org Ggml Cpu riscv quants
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (Architecture-Specific SIMD) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, SIMD_Optimization |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
RISC-V Vector (RVV) extension-optimized quantization, dequantization, and dot product routines for GGML quantized tensor formats on RISC-V processors.
Description
arch/riscv/quants.c implements RISC-V-specific SIMD acceleration for GGML quantization operations, leveraging the RISC-V Vector (RVV) extension's scalable vector length model.
The RVV implementation differs architecturally from fixed-width SIMD implementations (NEON, SSE, etc.) by using scalable vector registers that can process an entire 32-element quantization block in a single operation:
__riscv_vle32_v_f32m8-- load a full block of 32 floats into an m8 register group__riscv_vfabs_v_f32m8-- compute absolute values across the entire vector__riscv_vfredmax_vs_f32m8_f32m1-- vector reduction to find the maximum element__riscv_vfmul_vf_f32m8-- scalar-vector multiply for scaling__riscv_vfncvt_x_f_w_i16m4-- narrowing float-to-int16 conversion__riscv_vncvt_x_x_w_i8m2-- narrowing int16-to-int8 conversion__riscv_vse8_v_i8m2-- store packed int8 results
For quantize_row_q8_0, the entire quantization of a block is expressed in just a few vector instructions: load, absolute max reduction, scale, convert float-to-int16-to-int8 via two narrowing operations, and store. This is notably more concise than other architectures due to RVV's wide register groups.
For quantize_row_q8_1, the function additionally computes a block sum (y[i].s) using __riscv_vwcvt_x_x_v_i16m4 and __riscv_vwredsum_vs_i16m4_i32m1 to support quantization formats that store both scale and sum.
All SIMD paths are guarded by #if defined(__riscv_v) and fall back to scalar reference implementations when the V extension is not available.
Usage
This file is compiled as part of the GGML CPU backend when targeting RISC-V platforms with the V (vector) extension enabled. It supports emerging RISC-V hardware such as SiFive boards and RISC-V development platforms.
Code Reference
Source Location
GGML repo, file: src/ggml-cpu/arch/riscv/quants.c (1956 lines).
Key Signatures
void quantize_row_q8_0(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
void quantize_row_q8_1(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
void ggml_vec_dot_q4_0_q8_0(int n, float * GGML_RESTRICT s, size_t bs,
const void * GGML_RESTRICT vx, size_t bx,
const void * GGML_RESTRICT vy, size_t by, int nrc);
Import
#include "ggml-quants.h"
#include "ggml-cpu.h"
#include "simd-mappings.h"
I/O Contract
Inputs (Quantization)
| Parameter | Type | Description |
|---|---|---|
x |
const float * |
Source array of floating-point values to be quantized. |
k |
int64_t |
Number of elements to quantize. Must be a multiple of the block size (32 for q8_0/q8_1). |
Outputs (Quantization)
| Output | Type | Description |
|---|---|---|
vy |
void * |
Destination buffer for the quantized block data. |
Inputs (Dot Product)
| Parameter | Type | Description |
|---|---|---|
n |
int |
Number of elements in each input vector. |
vx |
const void * |
Pointer to quantized weight data. |
vy |
const void * |
Pointer to quantized activation data. |
nrc |
int |
Number of rows to compute simultaneously. |
Outputs (Dot Product)
| Output | Type | Description |
|---|---|---|
s |
float * |
Destination for the computed dot product result(s). |
Usage Examples
// Quantize a row using RISC-V Vector extension
// The RVV implementation processes the entire 32-element block
// in a single set of vector operations
float input[256];
block_q8_0 output[256 / QK8_0];
quantize_row_q8_0(input, output, 256);
Related Pages
- Principle:Ggml_org_Ggml_Architecture_Specific_SIMD_Quantization
- Implementation:Ggml_org_Ggml_Cpu_riscv_repack -- RISC-V RVV matrix repacking and GEMV/GEMM kernels
- Implementation:Ggml_org_Ggml_Cpu_arm_quants -- ARM NEON equivalent
- Implementation:Ggml_org_Ggml_Cpu_x86_quants -- x86 SSE/AVX equivalent
- Implementation:Ggml_org_Ggml_Cpu_loongarch_quants -- LoongArch LSX equivalent
- Implementation:Ggml_org_Ggml_Cpu_powerpc_quants -- PowerPC VSX equivalent
- Implementation:Ggml_org_Ggml_Cpu_s390_quants -- s390x VXE equivalent
- Implementation:Ggml_org_Ggml_Cpu_wasm_quants -- WebAssembly SIMD128 equivalent