Implementation:Ggml org Ggml Cpu quantization

Metadata

Field	Value
Page Type	Implementation (Quantization)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, CPU_Backend, Quantization
Last Updated	2025-05-15 12:00 GMT

Overview

CPU-specific quantization row functions and quantized dot product implementations, providing generic fallback implementations that can be overridden by architecture-specific versions.

Description

quants.c implements the quantization and dequantization primitives essential for running quantized models on CPU. It provides:

Quantization row functions: Converts float arrays to quantized block formats. Covers q4_0, q4_1, q5_0, q5_1, q8_0, q8_1, q2_K through q6_K, q8_K, tq1_0 (ternary), tq2_0, iq4_nl, iq4_xs, and mxfp4 formats. Most delegate to *_ref reference implementations from ggml-quants.h.
Quantized dot products: Generic ggml_vec_dot_q*_generic functions that compute the dot product between quantized vectors. For example, ggml_vec_dot_q4_0_q8_0_generic unpacks 4-bit nibbles, multiplies with q8_0 values, and accumulates with scale factors.
Architecture override mechanism: Functions are named with a _generic suffix. The arch-fallback.h header renames them to canonical names when no architecture-specific optimized version exists. Optimized versions for ARM NEON, x86 AVX/AVX2, etc., live in arch/arm/quants.c, arch/x86/quants.c, etc.

Usage

These functions are called internally by the matrix multiplication and tensor copy operations. They are not typically called directly by user code.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/quants.c (1193 lines).

Signature

// Quantization row functions
void quantize_row_q4_0(const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_q4_1(const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_q8_0_generic(const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_q2_K(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);
void quantize_row_q4_K(const float * GGML_RESTRICT x, void * GGML_RESTRICT vy, int64_t k);

// Quantized dot product functions
void ggml_vec_dot_q4_0_q8_0_generic(int n, float * GGML_RESTRICT s, size_t bs,
    const void * GGML_RESTRICT vx, size_t bx,
    const void * GGML_RESTRICT vy, size_t by, int nrc);

Import

#include "quants.h"

I/O Contract

Inputs

Parameter	Type	Required	Description
`x`	`const float *`	Yes	Source float array to quantize.
`y` / `vy`	`void *`	Yes	Destination buffer for quantized blocks.
`k`	`int64_t`	Yes	Number of elements (must be a multiple of the block size, e.g., 32 for q4_0).
`n`	`int`	Yes (dot)	Number of elements in dot product vectors.
`vx`, `vy`	`const void *`	Yes (dot)	Quantized input vectors for dot product.

Outputs

Output	Type	Description
`y`	`void *`	Quantized block data written to the output buffer.
`s`	`float *`	Scalar dot product result (for dot product functions).

Usage Examples

Quantizing a Float Vector to Q4_0

#include "quants.h"

float data[256] = { /* ... */ };
block_q4_0 quantized[256 / QK4_0];

// Quantize 256 floats into q4_0 blocks
quantize_row_q4_0(data, quantized, 256);

Computing a Quantized Dot Product

#include "quants.h"

float result;
ggml_vec_dot_q4_0_q8_0_generic(
    256,          // n elements
    &result,      // output scalar
    0,            // bs (unused)
    q4_data,      // quantized q4_0 vector
    0,            // bx (unused)
    q8_data,      // quantized q8_0 vector
    0,            // by (unused)
    1             // nrc (number of rows to compute)
);

Related Pages

Ggml_org_Ggml_Cpu_tensor_ops -- Matrix multiply operations that use quantized dot products.
Ggml_org_Ggml_Cpu_vec_api -- Vectorized math primitives at the float level.
Ggml_org_Ggml_Cpu_simd_mappings -- SIMD macros used by optimized quantization implementations.
Ggml_org_Ggml_Cpu_amx_mmq -- AMX-accelerated quantized matmul that uses these quantization types.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment