Implementation:LLMBook zh LLMBook zh github io Quantize Func
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Model_Compression, Inference |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for 8-bit symmetric quantization and dequantization of tensors provided by the LLMBook repository.
Description
The quantize_func function maps float32 tensors to int8 by dividing by the scale factor, adding the zero point, rounding, and clamping. The dequantize_func reverses the process to reconstruct approximate float values. Together they demonstrate the fundamental quantization-dequantization round-trip.
Usage
Use these functions to understand the basic quantization algorithm. For production quantization, use libraries like bitsandbytes or auto-gptq.
Code Reference
Source Location
- Repository: LLMBook-zh
- File: code/9.2 量化示例.py
- Lines: 4-13
Signature
def quantize_func(x: Tensor, scales: float, zero_point: int, n_bits: int = 8) -> Tensor:
"""
Quantizes a float tensor to integer representation.
Args:
x: Input float32 tensor.
scales: Scale factor S = (beta - alpha) / (beta_q - alpha_q).
zero_point: Zero point offset Z.
n_bits: Bit width (default 8).
Returns:
Clamped integer tensor in [alpha_q, beta_q].
"""
def dequantize_func(x_q: Tensor, scales: float, zero_point: int) -> Tensor:
"""
Dequantizes an integer tensor back to float32.
Args:
x_q: Quantized integer tensor.
scales: Scale factor.
zero_point: Zero point offset.
Returns:
Reconstructed float32 tensor.
"""
Import
from quantization import quantize_func, dequantize_func
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| x | Tensor | Yes | Float32 tensor to quantize |
| scales | float | Yes | Quantization scale factor |
| zero_point | int | Yes | Zero point offset |
| n_bits | int | No | Bit width (default 8) |
Outputs
| Name | Type | Description |
|---|---|---|
| quantize_func returns | Tensor | Integer tensor clamped to [alpha_q, beta_q] |
| dequantize_func returns | Tensor | Reconstructed float32 tensor |
Usage Examples
import torch
import numpy as np
# Configuration
alpha, beta = -100.0, 80.0
n_bits = 8
alpha_q, beta_q = -128, 127
# Compute quantization parameters
S = (beta - alpha) / (beta_q - alpha_q)
Z = int((beta * alpha_q - alpha * beta_q) / (beta - alpha))
# Quantize
float_x = torch.tensor([[-1.2136, 28.7341, 8.4974],
[-1.9210, -23.7421, 16.2609]])
x_q = quantize_func(float_x, S, Z)
print(f"Quantized: {x_q}")
# Dequantize
x_re = dequantize_func(x_q, S, Z)
print(f"Reconstructed: {x_re}")
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment