Implementation:Ggml org Ggml Cpu wasm quants

Metadata

Field	Value
Page Type	Implementation (Architecture-Specific SIMD)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, SIMD_Optimization
Last Updated	2025-05-15 12:00 GMT

Overview

WebAssembly SIMD128-optimized quantization, dequantization, and dot product routines for GGML quantized tensor formats, enabling fast inference in web browsers and WASM runtimes.

Description

arch/wasm/quants.c implements WebAssembly SIMD128-specific acceleration for GGML quantization operations, targeting web browsers and WASM runtimes that support the SIMD128 proposal.

The implementation uses the wasm_simd128.h intrinsics API, working with 128-bit v128_t vectors (4x float32):

wasm_v128_load -- load 128 bits from memory
wasm_f32x4_abs / wasm_f32x4_max -- absolute value and element-wise maximum
wasm_f32x4_mul / wasm_f32x4_splat -- multiply and broadcast scalar
wasm_i32x4_trunc_sat_f32x4 -- saturating float-to-integer truncation
wasm_f32x4_extract_lane / wasm_i32x4_extract_lane -- extract scalar from lane

The quantization pattern follows the standard approach used across all architectures: load eight groups of four floats, find the block maximum via tree reduction, compute a scale factor, multiply-and-round to integer, and store packed results. The WASM implementation notably uses wasm_i32x4_trunc_sat_f32x4 for float-to-int conversion (truncation rather than rounding), requiring the scale to be pre-adjusted.

Precomputed bit-expansion tables (table_b2b_0, table_b2b_1) support sub-byte format unpacking. All SIMD paths are guarded by #if defined(__wasm_simd128__) and fall back to scalar reference implementations otherwise.

Usage

This file is compiled when GGML is built as a WebAssembly module (e.g., using Emscripten) with SIMD128 support enabled. It powers web-based LLM inference applications running in browsers or Node.js environments.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/arch/wasm/quants.c (1221 lines).

Key Signatures

void quantize_row_q8_0(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
void quantize_row_q8_1(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);

void ggml_vec_dot_q4_0_q8_0(int n, float * GGML_RESTRICT s, size_t bs,
    const void * GGML_RESTRICT vx, size_t bx,
    const void * GGML_RESTRICT vy, size_t by, int nrc);

Import

#include "ggml-quants.h"
#include "ggml-cpu.h"
#include "simd-mappings.h"

I/O Contract

Inputs (Quantization)

Parameter	Type	Description
`x`	`const float *`	Source array of floating-point values to be quantized.
`k`	`int64_t`	Number of elements to quantize. Must be a multiple of the block size.

Outputs (Quantization)

Output	Type	Description
`vy`	`void *`	Destination buffer for the quantized block data.

Inputs (Dot Product)

Parameter	Type	Description
`n`	`int`	Number of elements in each input vector.
`vx`	`const void *`	Pointer to quantized weight data.
`vy`	`const void *`	Pointer to quantized activation data.
`nrc`	`int`	Number of rows to compute simultaneously.

Outputs (Dot Product)

Output	Type	Description
`s`	`float *`	Destination for the computed dot product result(s).

Usage Examples

// Quantize a row using WebAssembly SIMD128
// (compiled via Emscripten with -msimd128)
float input[256];
block_q8_0 output[256 / QK8_0];

quantize_row_q8_0(input, output, 256);

// Compute quantized dot product in a WASM runtime
float result;
ggml_vec_dot_q4_0_q8_0(256, &result, sizeof(result),
    weight_blocks, sizeof(block_q4_0),
    activation_blocks, sizeof(block_q8_0), 1);

Related Pages

Principle:Ggml_org_Ggml_Architecture_Specific_SIMD_Quantization
Implementation:Ggml_org_Ggml_Cpu_arm_quants -- ARM NEON equivalent
Implementation:Ggml_org_Ggml_Cpu_x86_quants -- x86 SSE/AVX equivalent
Implementation:Ggml_org_Ggml_Cpu_loongarch_quants -- LoongArch LSX equivalent
Implementation:Ggml_org_Ggml_Cpu_powerpc_quants -- PowerPC VSX equivalent
Implementation:Ggml_org_Ggml_Cpu_riscv_quants -- RISC-V RVV equivalent
Implementation:Ggml_org_Ggml_Cpu_s390_quants -- s390x VXE equivalent

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment