Implementation:Ggml org Ggml Cpu wasm quants
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (Architecture-Specific SIMD) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, SIMD_Optimization |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
WebAssembly SIMD128-optimized quantization, dequantization, and dot product routines for GGML quantized tensor formats, enabling fast inference in web browsers and WASM runtimes.
Description
arch/wasm/quants.c implements WebAssembly SIMD128-specific acceleration for GGML quantization operations, targeting web browsers and WASM runtimes that support the SIMD128 proposal.
The implementation uses the wasm_simd128.h intrinsics API, working with 128-bit v128_t vectors (4x float32):
wasm_v128_load-- load 128 bits from memorywasm_f32x4_abs/wasm_f32x4_max-- absolute value and element-wise maximumwasm_f32x4_mul/wasm_f32x4_splat-- multiply and broadcast scalarwasm_i32x4_trunc_sat_f32x4-- saturating float-to-integer truncationwasm_f32x4_extract_lane/wasm_i32x4_extract_lane-- extract scalar from lane
The quantization pattern follows the standard approach used across all architectures: load eight groups of four floats, find the block maximum via tree reduction, compute a scale factor, multiply-and-round to integer, and store packed results. The WASM implementation notably uses wasm_i32x4_trunc_sat_f32x4 for float-to-int conversion (truncation rather than rounding), requiring the scale to be pre-adjusted.
Precomputed bit-expansion tables (table_b2b_0, table_b2b_1) support sub-byte format unpacking. All SIMD paths are guarded by #if defined(__wasm_simd128__) and fall back to scalar reference implementations otherwise.
Usage
This file is compiled when GGML is built as a WebAssembly module (e.g., using Emscripten) with SIMD128 support enabled. It powers web-based LLM inference applications running in browsers or Node.js environments.
Code Reference
Source Location
GGML repo, file: src/ggml-cpu/arch/wasm/quants.c (1221 lines).
Key Signatures
void quantize_row_q8_0(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
void quantize_row_q8_1(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
void ggml_vec_dot_q4_0_q8_0(int n, float * GGML_RESTRICT s, size_t bs,
const void * GGML_RESTRICT vx, size_t bx,
const void * GGML_RESTRICT vy, size_t by, int nrc);
Import
#include "ggml-quants.h"
#include "ggml-cpu.h"
#include "simd-mappings.h"
I/O Contract
Inputs (Quantization)
| Parameter | Type | Description |
|---|---|---|
x |
const float * |
Source array of floating-point values to be quantized. |
k |
int64_t |
Number of elements to quantize. Must be a multiple of the block size. |
Outputs (Quantization)
| Output | Type | Description |
|---|---|---|
vy |
void * |
Destination buffer for the quantized block data. |
Inputs (Dot Product)
| Parameter | Type | Description |
|---|---|---|
n |
int |
Number of elements in each input vector. |
vx |
const void * |
Pointer to quantized weight data. |
vy |
const void * |
Pointer to quantized activation data. |
nrc |
int |
Number of rows to compute simultaneously. |
Outputs (Dot Product)
| Output | Type | Description |
|---|---|---|
s |
float * |
Destination for the computed dot product result(s). |
Usage Examples
// Quantize a row using WebAssembly SIMD128
// (compiled via Emscripten with -msimd128)
float input[256];
block_q8_0 output[256 / QK8_0];
quantize_row_q8_0(input, output, 256);
// Compute quantized dot product in a WASM runtime
float result;
ggml_vec_dot_q4_0_q8_0(256, &result, sizeof(result),
weight_blocks, sizeof(block_q4_0),
activation_blocks, sizeof(block_q8_0), 1);
Related Pages
- Principle:Ggml_org_Ggml_Architecture_Specific_SIMD_Quantization
- Implementation:Ggml_org_Ggml_Cpu_arm_quants -- ARM NEON equivalent
- Implementation:Ggml_org_Ggml_Cpu_x86_quants -- x86 SSE/AVX equivalent
- Implementation:Ggml_org_Ggml_Cpu_loongarch_quants -- LoongArch LSX equivalent
- Implementation:Ggml_org_Ggml_Cpu_powerpc_quants -- PowerPC VSX equivalent
- Implementation:Ggml_org_Ggml_Cpu_riscv_quants -- RISC-V RVV equivalent
- Implementation:Ggml_org_Ggml_Cpu_s390_quants -- s390x VXE equivalent