Implementation:Ggml org Ggml Cpu riscv quants

Metadata

Field	Value
Page Type	Implementation (Architecture-Specific SIMD)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, SIMD_Optimization
Last Updated	2025-05-15 12:00 GMT

Overview

RISC-V Vector (RVV) extension-optimized quantization, dequantization, and dot product routines for GGML quantized tensor formats on RISC-V processors.

Description

arch/riscv/quants.c implements RISC-V-specific SIMD acceleration for GGML quantization operations, leveraging the RISC-V Vector (RVV) extension's scalable vector length model.

The RVV implementation differs architecturally from fixed-width SIMD implementations (NEON, SSE, etc.) by using scalable vector registers that can process an entire 32-element quantization block in a single operation:

__riscv_vle32_v_f32m8 -- load a full block of 32 floats into an m8 register group
__riscv_vfabs_v_f32m8 -- compute absolute values across the entire vector
__riscv_vfredmax_vs_f32m8_f32m1 -- vector reduction to find the maximum element
__riscv_vfmul_vf_f32m8 -- scalar-vector multiply for scaling
__riscv_vfncvt_x_f_w_i16m4 -- narrowing float-to-int16 conversion
__riscv_vncvt_x_x_w_i8m2 -- narrowing int16-to-int8 conversion
__riscv_vse8_v_i8m2 -- store packed int8 results

For quantize_row_q8_0, the entire quantization of a block is expressed in just a few vector instructions: load, absolute max reduction, scale, convert float-to-int16-to-int8 via two narrowing operations, and store. This is notably more concise than other architectures due to RVV's wide register groups.

For quantize_row_q8_1, the function additionally computes a block sum (y[i].s) using __riscv_vwcvt_x_x_v_i16m4 and __riscv_vwredsum_vs_i16m4_i32m1 to support quantization formats that store both scale and sum.

All SIMD paths are guarded by #if defined(__riscv_v) and fall back to scalar reference implementations when the V extension is not available.

Usage

This file is compiled as part of the GGML CPU backend when targeting RISC-V platforms with the V (vector) extension enabled. It supports emerging RISC-V hardware such as SiFive boards and RISC-V development platforms.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/arch/riscv/quants.c (1956 lines).

Key Signatures

void quantize_row_q8_0(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
void quantize_row_q8_1(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);

void ggml_vec_dot_q4_0_q8_0(int n, float * GGML_RESTRICT s, size_t bs,
    const void * GGML_RESTRICT vx, size_t bx,
    const void * GGML_RESTRICT vy, size_t by, int nrc);

Import

#include "ggml-quants.h"
#include "ggml-cpu.h"
#include "simd-mappings.h"

I/O Contract

Inputs (Quantization)

Parameter	Type	Description
`x`	`const float *`	Source array of floating-point values to be quantized.
`k`	`int64_t`	Number of elements to quantize. Must be a multiple of the block size (32 for q8_0/q8_1).

Outputs (Quantization)

Output	Type	Description
`vy`	`void *`	Destination buffer for the quantized block data.

Inputs (Dot Product)

Parameter	Type	Description
`n`	`int`	Number of elements in each input vector.
`vx`	`const void *`	Pointer to quantized weight data.
`vy`	`const void *`	Pointer to quantized activation data.
`nrc`	`int`	Number of rows to compute simultaneously.

Outputs (Dot Product)

Output	Type	Description
`s`	`float *`	Destination for the computed dot product result(s).

Usage Examples

// Quantize a row using RISC-V Vector extension
// The RVV implementation processes the entire 32-element block
// in a single set of vector operations
float input[256];
block_q8_0 output[256 / QK8_0];

quantize_row_q8_0(input, output, 256);

Related Pages

Principle:Ggml_org_Ggml_Architecture_Specific_SIMD_Quantization
Implementation:Ggml_org_Ggml_Cpu_riscv_repack -- RISC-V RVV matrix repacking and GEMV/GEMM kernels
Implementation:Ggml_org_Ggml_Cpu_arm_quants -- ARM NEON equivalent
Implementation:Ggml_org_Ggml_Cpu_x86_quants -- x86 SSE/AVX equivalent
Implementation:Ggml_org_Ggml_Cpu_loongarch_quants -- LoongArch LSX equivalent
Implementation:Ggml_org_Ggml_Cpu_powerpc_quants -- PowerPC VSX equivalent
Implementation:Ggml_org_Ggml_Cpu_s390_quants -- s390x VXE equivalent
Implementation:Ggml_org_Ggml_Cpu_wasm_quants -- WebAssembly SIMD128 equivalent

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment