Principle:LLMBook zh LLMBook zh github io Symmetric Quantization

Knowledge Sources	A Survey of Quantization Methods for Efficient Neural Network Inference LLMBook-zh
Domains	Deep_Learning, Model_Compression, Inference
Last Updated	2026-02-08 00:00 GMT

Overview

A model compression technique that maps floating-point weight values to lower-precision integers using a linear scale and zero point.

Description

Symmetric Quantization reduces model memory footprint by representing float32 weights as lower-precision integers (typically int8). The technique uses two parameters: a scale factor (S) and a zero point (Z) to map between the float and integer domains. The quantized values are clamped to the representable range of the target bit width.

Dequantization reverses the process to approximate the original float values, introducing a small quantization error.

Usage

Use this principle when you need to reduce model size for deployment on memory-constrained devices. 8-bit quantization typically preserves model quality with minimal degradation while halving memory usage.

Theoretical Basis

Given a float range $[α, β]$ and integer range $[α_{q}, β_{q}]$ :

Scale factor: $S = \frac{β - α}{β_{q} - α_{q}}$

Zero point: $Z = round (\frac{β \cdot α_{q} - α \cdot β_{q}}{β - α})$

Quantization: $x_{q} = clamp (round (\frac{x}{S} + Z), α_{q}, β_{q})$

Dequantization: $\hat{x} = S \cdot (x_{q} - Z)$

Related Pages

Implemented By

Implementation:LLMBook_zh_LLMBook_zh_github_io_Quantize_Func

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment