Implementation:LLMBook zh LLMBook zh github io Quantize Func

Knowledge Sources	LLMBook-zh
Domains	Deep_Learning, Model_Compression, Inference
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for 8-bit symmetric quantization and dequantization of tensors provided by the LLMBook repository.

Description

The quantize_func function maps float32 tensors to int8 by dividing by the scale factor, adding the zero point, rounding, and clamping. The dequantize_func reverses the process to reconstruct approximate float values. Together they demonstrate the fundamental quantization-dequantization round-trip.

Usage

Use these functions to understand the basic quantization algorithm. For production quantization, use libraries like bitsandbytes or auto-gptq.

Code Reference

Source Location

Repository: LLMBook-zh
File: code/9.2 量化示例.py
Lines: 4-13

Signature

def quantize_func(x: Tensor, scales: float, zero_point: int, n_bits: int = 8) -> Tensor:
    """
    Quantizes a float tensor to integer representation.

    Args:
        x: Input float32 tensor.
        scales: Scale factor S = (beta - alpha) / (beta_q - alpha_q).
        zero_point: Zero point offset Z.
        n_bits: Bit width (default 8).

    Returns:
        Clamped integer tensor in [alpha_q, beta_q].
    """

def dequantize_func(x_q: Tensor, scales: float, zero_point: int) -> Tensor:
    """
    Dequantizes an integer tensor back to float32.

    Args:
        x_q: Quantized integer tensor.
        scales: Scale factor.
        zero_point: Zero point offset.

    Returns:
        Reconstructed float32 tensor.
    """

Import

from quantization import quantize_func, dequantize_func

I/O Contract

Inputs

Name	Type	Required	Description
x	Tensor	Yes	Float32 tensor to quantize
scales	float	Yes	Quantization scale factor
zero_point	int	Yes	Zero point offset
n_bits	int	No	Bit width (default 8)

Outputs

Name	Type	Description
quantize_func returns	Tensor	Integer tensor clamped to [alpha_q, beta_q]
dequantize_func returns	Tensor	Reconstructed float32 tensor

Usage Examples

import torch
import numpy as np

# Configuration
alpha, beta = -100.0, 80.0
n_bits = 8
alpha_q, beta_q = -128, 127

# Compute quantization parameters
S = (beta - alpha) / (beta_q - alpha_q)
Z = int((beta * alpha_q - alpha * beta_q) / (beta - alpha))

# Quantize
float_x = torch.tensor([[-1.2136, 28.7341, 8.4974],
                         [-1.9210, -23.7421, 16.2609]])
x_q = quantize_func(float_x, S, Z)
print(f"Quantized: {x_q}")

# Dequantize
x_re = dequantize_func(x_q, S, Z)
print(f"Reconstructed: {x_re}")

Related Pages

Implements Principle

Principle:LLMBook_zh_LLMBook_zh_github_io_Symmetric_Quantization

Requires Environment

Environment:LLMBook_zh_LLMBook_zh_github_io_PyTorch_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment