Implementation:Ggml org Ggml Cpu loongarch quants
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (Architecture-Specific SIMD) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, SIMD_Optimization |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
LoongArch LSX/LASX SIMD-optimized quantization, dequantization, and dot product routines for GGML quantized tensor formats on Loongson processors.
Description
arch/loongarch/quants.c implements LoongArch-specific SIMD acceleration for GGML quantization operations, targeting processors that support the LSX (Loongson SIMD Extension) vector instruction set.
The file takes a distinctive approach compared to other architecture ports: rather than writing each quantization function directly in LoongArch intrinsics, it first defines a compatibility layer of helper functions that map x86-style SSE operations onto LoongArch LSX intrinsics. These wrappers include:
lsx_packs_w/lsx_packs_h/lsx_packus_h-- saturating pack operations (equivalent to SSE_mm_packs_epi32, etc.)lsx_maddubs_h-- unsigned/signed byte multiply-add (equivalent to_mm_maddubs_epi16)lsx_madd_h-- signed halfword multiply-add (equivalent to_mm_madd_epi16)lsx_shuffle_b-- byte shuffle (equivalent to_mm_shuffle_epi8)lsx_hadd_h/lsx_hadd_w/lsx_hadd_s-- horizontal add operationslsx_set_w-- set 4 integer elementshsum_float_4x4-- horizontal sum of four 128-bit float vectors
These wrappers enable the quantization and dot product functions to follow the same algorithmic structure as the x86 implementation, with the underlying SIMD operations mapped to LoongArch equivalents. All paths are guarded by #if defined(__loongarch_sx).
Usage
This file is compiled as part of the GGML CPU backend when targeting LoongArch platforms with LSX support (e.g., Loongson 3A5000, 3A6000 processors). The build system automatically selects this implementation when the target architecture is detected.
Code Reference
Source Location
GGML repo, file: src/ggml-cpu/arch/loongarch/quants.c (2159 lines).
Key Signatures
// Compatibility layer helpers (LoongArch LSX wrappers)
static __m128i lsx_packs_w(__m128i a, __m128i b);
static __m128i lsx_maddubs_h(__m128i a, __m128i b);
static __m128i lsx_shuffle_b(__m128i a, __m128i b);
static inline float hsum_float_4x4(const __m128 a, const __m128 b, const __m128 c, const __m128 d);
// Standard quantization interface
void quantize_row_q8_0(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
void quantize_row_q8_1(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
Import
#include "ggml-quants.h"
#include "ggml-cpu.h"
#include "simd-mappings.h"
I/O Contract
Inputs (Quantization)
| Parameter | Type | Description |
|---|---|---|
x |
const float * |
Source array of floating-point values to be quantized. |
k |
int64_t |
Number of elements to quantize. Must be a multiple of the block size. |
Outputs (Quantization)
| Output | Type | Description |
|---|---|---|
vy |
void * |
Destination buffer for the quantized block data. |
Inputs (Dot Product)
| Parameter | Type | Description |
|---|---|---|
n |
int |
Number of elements in each input vector. |
vx |
const void * |
Pointer to quantized weight data. |
vy |
const void * |
Pointer to quantized activation data. |
nrc |
int |
Number of rows to compute simultaneously. |
Outputs (Dot Product)
| Output | Type | Description |
|---|---|---|
s |
float * |
Destination for the computed dot product result(s). |
Usage Examples
// Quantize a row using LoongArch LSX SIMD
// (same API as all arch-specific implementations)
float input[256];
block_q8_0 output[256 / QK8_0];
quantize_row_q8_0(input, output, 256);
Related Pages
- Principle:Ggml_org_Ggml_Architecture_Specific_SIMD_Quantization
- Implementation:Ggml_org_Ggml_Cpu_arm_quants -- ARM NEON equivalent
- Implementation:Ggml_org_Ggml_Cpu_x86_quants -- x86 SSE/AVX equivalent
- Implementation:Ggml_org_Ggml_Cpu_powerpc_quants -- PowerPC VSX equivalent
- Implementation:Ggml_org_Ggml_Cpu_riscv_quants -- RISC-V RVV equivalent
- Implementation:Ggml_org_Ggml_Cpu_s390_quants -- s390x VXE equivalent
- Implementation:Ggml_org_Ggml_Cpu_wasm_quants -- WebAssembly SIMD128 equivalent