Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Cpu powerpc quants

From Leeroopedia


Metadata

Field Value
Page Type Implementation (Architecture-Specific SIMD)
Knowledge Sources GGML
Domains ML_Infrastructure, Tensor_Computing, SIMD_Optimization
Last Updated 2025-05-15 12:00 GMT

Overview

POWER9 VSX SIMD-optimized quantization, dequantization, and dot product routines for GGML quantized tensor formats on IBM PowerPC processors.

Description

arch/powerpc/quants.c implements PowerPC-specific SIMD acceleration for GGML quantization operations, targeting POWER9 and later processors with the VSX (Vector Scalar Extension) instruction set.

The implementation uses the PowerPC Altivec/VSX intrinsics API with the vector keyword type qualifiers:

  • vec_xl -- aligned/unaligned vector loads
  • vec_abs / vec_max -- absolute value and maximum operations
  • vec_round / vec_cts -- rounding and float-to-integer conversion
  • vec_pack -- saturation packing from wider to narrower types
  • vec_xst -- vector store

The quantization pattern (e.g., quantize_row_q8_0) loads eight groups of four floats into vector float registers, computes the block maximum via tree reduction, derives a scale factor, multiplies and rounds to vector signed int, then chains vec_pack calls to progressively narrow from int32 to int16 to int8 before storing.

Precomputed bit-expansion tables (table_b2b_0, table_b2b_1) support efficient unpacking of sub-byte quantized formats. All SIMD paths are guarded by #if defined(__POWER9_VECTOR__) and fall back to scalar reference implementations otherwise.

Usage

This file is compiled as part of the GGML CPU backend when targeting PowerPC platforms with POWER9+ vector support. It enables efficient ML inference on IBM POWER server hardware.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/arch/powerpc/quants.c (2305 lines).

Key Signatures

void quantize_row_q8_0(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
void quantize_row_q8_1(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);

void ggml_vec_dot_q4_0_q8_0(int n, float * GGML_RESTRICT s, size_t bs,
    const void * GGML_RESTRICT vx, size_t bx,
    const void * GGML_RESTRICT vy, size_t by, int nrc);

Import

#include "ggml-quants.h"
#include "ggml-cpu.h"
#include "simd-mappings.h"

I/O Contract

Inputs (Quantization)

Parameter Type Description
x const float * Source array of floating-point values to be quantized.
k int64_t Number of elements to quantize. Must be a multiple of the block size.

Outputs (Quantization)

Output Type Description
vy void * Destination buffer for the quantized block data.

Inputs (Dot Product)

Parameter Type Description
n int Number of elements in each input vector.
vx const void * Pointer to quantized weight data.
vy const void * Pointer to quantized activation data.
nrc int Number of rows to compute simultaneously.

Outputs (Dot Product)

Output Type Description
s float * Destination for the computed dot product result(s).

Usage Examples

// Quantize a row using PowerPC VSX SIMD
float input[256];
block_q8_0 output[256 / QK8_0];

quantize_row_q8_0(input, output, 256);

// Compute quantized dot product on POWER9
float result;
ggml_vec_dot_q4_0_q8_0(256, &result, sizeof(result),
    weight_blocks, sizeof(block_q4_0),
    activation_blocks, sizeof(block_q8_0), 1);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment