Implementation:Ggml org Ggml Cpu s390 quants
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (Architecture-Specific SIMD) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, SIMD_Optimization |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
IBM s390x VXE/VXE2 vector-optimized quantization, dequantization, and dot product routines for GGML quantized tensor formats on IBM Z mainframe processors.
Description
arch/s390/quants.c implements IBM s390x-specific SIMD acceleration for GGML quantization operations, targeting IBM Z mainframes with the Vector Extension (VXE) or Vector Extension 2 (VXE2) facilities.
The implementation uses s390x vector intrinsics with the PowerPC-compatible vec_* API:
vec_xl-- vector loadvec_abs/vec_max-- absolute value and maximumvec_mul/vec_splats-- multiply and broadcast scalarvec_signed-- float-to-integer conversion__builtin_s390_vfisb-- s390-specific rounding instruction (used for non-default rounding mode in float-to-int conversion)vec_extract-- extract scalar from vector lane
Notable s390x-specific features include:
Big-endian data handling: A byteswap permute mask (v_kperm) is defined for handling endianness differences, since s390x is a big-endian architecture unlike the other supported platforms.
VXE/VXE2 dual guard: All SIMD paths are guarded by #if defined(__VXE__) || defined(__VXE2__), supporting both the first-generation and second-generation vector extensions.
Precomputed tables: Bit-expansion tables (table_b2b_0, table_b2b_1) are defined with explicit 16-byte alignment (__attribute__((aligned(16)))) for efficient vector loads on s390x.
The quantization functions follow the same algorithmic structure as other architectures, with the main difference being the use of __builtin_s390_vfisb(v, 4, 1) for controlled rounding behavior during float-to-integer conversion.
Usage
This file is compiled as part of the GGML CPU backend when targeting IBM s390x platforms with VXE or VXE2 support (IBM z14 and later). It enables ML inference workloads on enterprise mainframe hardware.
Code Reference
Source Location
GGML repo, file: src/ggml-cpu/arch/s390/quants.c (1468 lines).
Key Signatures
void quantize_row_q8_0(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
void quantize_row_q8_1(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
void ggml_vec_dot_q4_0_q8_0(int n, float * GGML_RESTRICT s, size_t bs,
const void * GGML_RESTRICT vx, size_t bx,
const void * GGML_RESTRICT vy, size_t by, int nrc);
Import
#include "ggml-quants.h"
#include "ggml-cpu.h"
#include "simd-mappings.h"
I/O Contract
Inputs (Quantization)
| Parameter | Type | Description |
|---|---|---|
x |
const float * |
Source array of floating-point values to be quantized. |
k |
int64_t |
Number of elements to quantize. Must be a multiple of the block size. |
Outputs (Quantization)
| Output | Type | Description |
|---|---|---|
vy |
void * |
Destination buffer for the quantized block data. |
Inputs (Dot Product)
| Parameter | Type | Description |
|---|---|---|
n |
int |
Number of elements in each input vector. |
vx |
const void * |
Pointer to quantized weight data. |
vy |
const void * |
Pointer to quantized activation data. |
nrc |
int |
Number of rows to compute simultaneously. |
Outputs (Dot Product)
| Output | Type | Description |
|---|---|---|
s |
float * |
Destination for the computed dot product result(s). |
Usage Examples
// Quantize a row using IBM s390x VXE vector instructions
// Note: s390x is big-endian, but the quantization API is identical
float input[256];
block_q8_0 output[256 / QK8_0];
quantize_row_q8_0(input, output, 256);
Related Pages
- Principle:Ggml_org_Ggml_Architecture_Specific_SIMD_Quantization
- Implementation:Ggml_org_Ggml_Cpu_arm_quants -- ARM NEON equivalent
- Implementation:Ggml_org_Ggml_Cpu_x86_quants -- x86 SSE/AVX equivalent
- Implementation:Ggml_org_Ggml_Cpu_loongarch_quants -- LoongArch LSX equivalent
- Implementation:Ggml_org_Ggml_Cpu_powerpc_quants -- PowerPC VSX equivalent
- Implementation:Ggml_org_Ggml_Cpu_riscv_quants -- RISC-V RVV equivalent
- Implementation:Ggml_org_Ggml_Cpu_wasm_quants -- WebAssembly SIMD128 equivalent