Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Cpu riscv quants

From Leeroopedia


Metadata

Field Value
Page Type Implementation (Architecture-Specific SIMD)
Knowledge Sources GGML
Domains ML_Infrastructure, Tensor_Computing, SIMD_Optimization
Last Updated 2025-05-15 12:00 GMT

Overview

RISC-V Vector (RVV) extension-optimized quantization, dequantization, and dot product routines for GGML quantized tensor formats on RISC-V processors.

Description

arch/riscv/quants.c implements RISC-V-specific SIMD acceleration for GGML quantization operations, leveraging the RISC-V Vector (RVV) extension's scalable vector length model.

The RVV implementation differs architecturally from fixed-width SIMD implementations (NEON, SSE, etc.) by using scalable vector registers that can process an entire 32-element quantization block in a single operation:

  • __riscv_vle32_v_f32m8 -- load a full block of 32 floats into an m8 register group
  • __riscv_vfabs_v_f32m8 -- compute absolute values across the entire vector
  • __riscv_vfredmax_vs_f32m8_f32m1 -- vector reduction to find the maximum element
  • __riscv_vfmul_vf_f32m8 -- scalar-vector multiply for scaling
  • __riscv_vfncvt_x_f_w_i16m4 -- narrowing float-to-int16 conversion
  • __riscv_vncvt_x_x_w_i8m2 -- narrowing int16-to-int8 conversion
  • __riscv_vse8_v_i8m2 -- store packed int8 results

For quantize_row_q8_0, the entire quantization of a block is expressed in just a few vector instructions: load, absolute max reduction, scale, convert float-to-int16-to-int8 via two narrowing operations, and store. This is notably more concise than other architectures due to RVV's wide register groups.

For quantize_row_q8_1, the function additionally computes a block sum (y[i].s) using __riscv_vwcvt_x_x_v_i16m4 and __riscv_vwredsum_vs_i16m4_i32m1 to support quantization formats that store both scale and sum.

All SIMD paths are guarded by #if defined(__riscv_v) and fall back to scalar reference implementations when the V extension is not available.

Usage

This file is compiled as part of the GGML CPU backend when targeting RISC-V platforms with the V (vector) extension enabled. It supports emerging RISC-V hardware such as SiFive boards and RISC-V development platforms.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/arch/riscv/quants.c (1956 lines).

Key Signatures

void quantize_row_q8_0(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
void quantize_row_q8_1(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);

void ggml_vec_dot_q4_0_q8_0(int n, float * GGML_RESTRICT s, size_t bs,
    const void * GGML_RESTRICT vx, size_t bx,
    const void * GGML_RESTRICT vy, size_t by, int nrc);

Import

#include "ggml-quants.h"
#include "ggml-cpu.h"
#include "simd-mappings.h"

I/O Contract

Inputs (Quantization)

Parameter Type Description
x const float * Source array of floating-point values to be quantized.
k int64_t Number of elements to quantize. Must be a multiple of the block size (32 for q8_0/q8_1).

Outputs (Quantization)

Output Type Description
vy void * Destination buffer for the quantized block data.

Inputs (Dot Product)

Parameter Type Description
n int Number of elements in each input vector.
vx const void * Pointer to quantized weight data.
vy const void * Pointer to quantized activation data.
nrc int Number of rows to compute simultaneously.

Outputs (Dot Product)

Output Type Description
s float * Destination for the computed dot product result(s).

Usage Examples

// Quantize a row using RISC-V Vector extension
// The RVV implementation processes the entire 32-element block
// in a single set of vector operations
float input[256];
block_q8_0 output[256 / QK8_0];

quantize_row_q8_0(input, output, 256);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment