Implementation:Ggml org Ggml Cpu powerpc quants
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (Architecture-Specific SIMD) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, SIMD_Optimization |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
POWER9 VSX SIMD-optimized quantization, dequantization, and dot product routines for GGML quantized tensor formats on IBM PowerPC processors.
Description
arch/powerpc/quants.c implements PowerPC-specific SIMD acceleration for GGML quantization operations, targeting POWER9 and later processors with the VSX (Vector Scalar Extension) instruction set.
The implementation uses the PowerPC Altivec/VSX intrinsics API with the vector keyword type qualifiers:
vec_xl-- aligned/unaligned vector loadsvec_abs/vec_max-- absolute value and maximum operationsvec_round/vec_cts-- rounding and float-to-integer conversionvec_pack-- saturation packing from wider to narrower typesvec_xst-- vector store
The quantization pattern (e.g., quantize_row_q8_0) loads eight groups of four floats into vector float registers, computes the block maximum via tree reduction, derives a scale factor, multiplies and rounds to vector signed int, then chains vec_pack calls to progressively narrow from int32 to int16 to int8 before storing.
Precomputed bit-expansion tables (table_b2b_0, table_b2b_1) support efficient unpacking of sub-byte quantized formats. All SIMD paths are guarded by #if defined(__POWER9_VECTOR__) and fall back to scalar reference implementations otherwise.
Usage
This file is compiled as part of the GGML CPU backend when targeting PowerPC platforms with POWER9+ vector support. It enables efficient ML inference on IBM POWER server hardware.
Code Reference
Source Location
GGML repo, file: src/ggml-cpu/arch/powerpc/quants.c (2305 lines).
Key Signatures
void quantize_row_q8_0(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
void quantize_row_q8_1(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
void ggml_vec_dot_q4_0_q8_0(int n, float * GGML_RESTRICT s, size_t bs,
const void * GGML_RESTRICT vx, size_t bx,
const void * GGML_RESTRICT vy, size_t by, int nrc);
Import
#include "ggml-quants.h"
#include "ggml-cpu.h"
#include "simd-mappings.h"
I/O Contract
Inputs (Quantization)
| Parameter | Type | Description |
|---|---|---|
x |
const float * |
Source array of floating-point values to be quantized. |
k |
int64_t |
Number of elements to quantize. Must be a multiple of the block size. |
Outputs (Quantization)
| Output | Type | Description |
|---|---|---|
vy |
void * |
Destination buffer for the quantized block data. |
Inputs (Dot Product)
| Parameter | Type | Description |
|---|---|---|
n |
int |
Number of elements in each input vector. |
vx |
const void * |
Pointer to quantized weight data. |
vy |
const void * |
Pointer to quantized activation data. |
nrc |
int |
Number of rows to compute simultaneously. |
Outputs (Dot Product)
| Output | Type | Description |
|---|---|---|
s |
float * |
Destination for the computed dot product result(s). |
Usage Examples
// Quantize a row using PowerPC VSX SIMD
float input[256];
block_q8_0 output[256 / QK8_0];
quantize_row_q8_0(input, output, 256);
// Compute quantized dot product on POWER9
float result;
ggml_vec_dot_q4_0_q8_0(256, &result, sizeof(result),
weight_blocks, sizeof(block_q4_0),
activation_blocks, sizeof(block_q8_0), 1);
Related Pages
- Principle:Ggml_org_Ggml_Architecture_Specific_SIMD_Quantization
- Implementation:Ggml_org_Ggml_Cpu_arm_quants -- ARM NEON equivalent
- Implementation:Ggml_org_Ggml_Cpu_x86_quants -- x86 SSE/AVX equivalent
- Implementation:Ggml_org_Ggml_Cpu_loongarch_quants -- LoongArch LSX equivalent
- Implementation:Ggml_org_Ggml_Cpu_riscv_quants -- RISC-V RVV equivalent
- Implementation:Ggml_org_Ggml_Cpu_s390_quants -- s390x VXE equivalent
- Implementation:Ggml_org_Ggml_Cpu_wasm_quants -- WebAssembly SIMD128 equivalent