Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Cpu loongarch quants

From Leeroopedia


Metadata

Field Value
Page Type Implementation (Architecture-Specific SIMD)
Knowledge Sources GGML
Domains ML_Infrastructure, Tensor_Computing, SIMD_Optimization
Last Updated 2025-05-15 12:00 GMT

Overview

LoongArch LSX/LASX SIMD-optimized quantization, dequantization, and dot product routines for GGML quantized tensor formats on Loongson processors.

Description

arch/loongarch/quants.c implements LoongArch-specific SIMD acceleration for GGML quantization operations, targeting processors that support the LSX (Loongson SIMD Extension) vector instruction set.

The file takes a distinctive approach compared to other architecture ports: rather than writing each quantization function directly in LoongArch intrinsics, it first defines a compatibility layer of helper functions that map x86-style SSE operations onto LoongArch LSX intrinsics. These wrappers include:

  • lsx_packs_w / lsx_packs_h / lsx_packus_h -- saturating pack operations (equivalent to SSE _mm_packs_epi32, etc.)
  • lsx_maddubs_h -- unsigned/signed byte multiply-add (equivalent to _mm_maddubs_epi16)
  • lsx_madd_h -- signed halfword multiply-add (equivalent to _mm_madd_epi16)
  • lsx_shuffle_b -- byte shuffle (equivalent to _mm_shuffle_epi8)
  • lsx_hadd_h / lsx_hadd_w / lsx_hadd_s -- horizontal add operations
  • lsx_set_w -- set 4 integer elements
  • hsum_float_4x4 -- horizontal sum of four 128-bit float vectors

These wrappers enable the quantization and dot product functions to follow the same algorithmic structure as the x86 implementation, with the underlying SIMD operations mapped to LoongArch equivalents. All paths are guarded by #if defined(__loongarch_sx).

Usage

This file is compiled as part of the GGML CPU backend when targeting LoongArch platforms with LSX support (e.g., Loongson 3A5000, 3A6000 processors). The build system automatically selects this implementation when the target architecture is detected.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/arch/loongarch/quants.c (2159 lines).

Key Signatures

// Compatibility layer helpers (LoongArch LSX wrappers)
static __m128i lsx_packs_w(__m128i a, __m128i b);
static __m128i lsx_maddubs_h(__m128i a, __m128i b);
static __m128i lsx_shuffle_b(__m128i a, __m128i b);
static inline float hsum_float_4x4(const __m128 a, const __m128 b, const __m128 c, const __m128 d);

// Standard quantization interface
void quantize_row_q8_0(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
void quantize_row_q8_1(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);

Import

#include "ggml-quants.h"
#include "ggml-cpu.h"
#include "simd-mappings.h"

I/O Contract

Inputs (Quantization)

Parameter Type Description
x const float * Source array of floating-point values to be quantized.
k int64_t Number of elements to quantize. Must be a multiple of the block size.

Outputs (Quantization)

Output Type Description
vy void * Destination buffer for the quantized block data.

Inputs (Dot Product)

Parameter Type Description
n int Number of elements in each input vector.
vx const void * Pointer to quantized weight data.
vy const void * Pointer to quantized activation data.
nrc int Number of rows to compute simultaneously.

Outputs (Dot Product)

Output Type Description
s float * Destination for the computed dot product result(s).

Usage Examples

// Quantize a row using LoongArch LSX SIMD
// (same API as all arch-specific implementations)
float input[256];
block_q8_0 output[256 / QK8_0];

quantize_row_q8_0(input, output, 256);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment