Implementation:Ggml org Ggml Cpu loongarch quants

Metadata

Field	Value
Page Type	Implementation (Architecture-Specific SIMD)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, SIMD_Optimization
Last Updated	2025-05-15 12:00 GMT

Overview

LoongArch LSX/LASX SIMD-optimized quantization, dequantization, and dot product routines for GGML quantized tensor formats on Loongson processors.

Description

arch/loongarch/quants.c implements LoongArch-specific SIMD acceleration for GGML quantization operations, targeting processors that support the LSX (Loongson SIMD Extension) vector instruction set.

The file takes a distinctive approach compared to other architecture ports: rather than writing each quantization function directly in LoongArch intrinsics, it first defines a compatibility layer of helper functions that map x86-style SSE operations onto LoongArch LSX intrinsics. These wrappers include:

lsx_packs_w / lsx_packs_h / lsx_packus_h -- saturating pack operations (equivalent to SSE _mm_packs_epi32, etc.)
lsx_maddubs_h -- unsigned/signed byte multiply-add (equivalent to _mm_maddubs_epi16)
lsx_madd_h -- signed halfword multiply-add (equivalent to _mm_madd_epi16)
lsx_shuffle_b -- byte shuffle (equivalent to _mm_shuffle_epi8)
lsx_hadd_h / lsx_hadd_w / lsx_hadd_s -- horizontal add operations
lsx_set_w -- set 4 integer elements
hsum_float_4x4 -- horizontal sum of four 128-bit float vectors

These wrappers enable the quantization and dot product functions to follow the same algorithmic structure as the x86 implementation, with the underlying SIMD operations mapped to LoongArch equivalents. All paths are guarded by #if defined(__loongarch_sx).

Usage

This file is compiled as part of the GGML CPU backend when targeting LoongArch platforms with LSX support (e.g., Loongson 3A5000, 3A6000 processors). The build system automatically selects this implementation when the target architecture is detected.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/arch/loongarch/quants.c (2159 lines).

Key Signatures

// Compatibility layer helpers (LoongArch LSX wrappers)
static __m128i lsx_packs_w(__m128i a, __m128i b);
static __m128i lsx_maddubs_h(__m128i a, __m128i b);
static __m128i lsx_shuffle_b(__m128i a, __m128i b);
static inline float hsum_float_4x4(const __m128 a, const __m128 b, const __m128 c, const __m128 d);

// Standard quantization interface
void quantize_row_q8_0(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
void quantize_row_q8_1(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);

Import

#include "ggml-quants.h"
#include "ggml-cpu.h"
#include "simd-mappings.h"

I/O Contract

Inputs (Quantization)

Parameter	Type	Description
`x`	`const float *`	Source array of floating-point values to be quantized.
`k`	`int64_t`	Number of elements to quantize. Must be a multiple of the block size.

Outputs (Quantization)

Output	Type	Description
`vy`	`void *`	Destination buffer for the quantized block data.

Inputs (Dot Product)

Parameter	Type	Description
`n`	`int`	Number of elements in each input vector.
`vx`	`const void *`	Pointer to quantized weight data.
`vy`	`const void *`	Pointer to quantized activation data.
`nrc`	`int`	Number of rows to compute simultaneously.

Outputs (Dot Product)

Output	Type	Description
`s`	`float *`	Destination for the computed dot product result(s).

Usage Examples

// Quantize a row using LoongArch LSX SIMD
// (same API as all arch-specific implementations)
float input[256];
block_q8_0 output[256 / QK8_0];

quantize_row_q8_0(input, output, 256);

Related Pages

Principle:Ggml_org_Ggml_Architecture_Specific_SIMD_Quantization
Implementation:Ggml_org_Ggml_Cpu_arm_quants -- ARM NEON equivalent
Implementation:Ggml_org_Ggml_Cpu_x86_quants -- x86 SSE/AVX equivalent
Implementation:Ggml_org_Ggml_Cpu_powerpc_quants -- PowerPC VSX equivalent
Implementation:Ggml_org_Ggml_Cpu_riscv_quants -- RISC-V RVV equivalent
Implementation:Ggml_org_Ggml_Cpu_s390_quants -- s390x VXE equivalent
Implementation:Ggml_org_Ggml_Cpu_wasm_quants -- WebAssembly SIMD128 equivalent

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment