Implementation:InternLM Lmdeploy Smooth Quant
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Model_Compression, Quantization |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Concrete tool for applying SmoothQuant W8A8 quantization to language models provided by the LMDeploy library.
Description
The smooth_quant() function and its CLI wrapper lmdeploy lite smooth_quant perform weight-activation co-quantization. It supports both INT8 and FP8 (float8_e4m3fn, float8_e5m2) output formats. The function handles calibration, activation smoothing, and quantized weight export.
Usage
Use this to quantize a model to W8A8 format for INT8 or FP8 inference. The resulting model requires the PyTorch backend.
Code Reference
Source Location
- Repository: lmdeploy
- File: lmdeploy/lite/apis/smooth_quant.py
- Lines: L17-131
Signature
def smooth_quant(model: str,
work_dir: str = './work_dir',
calib_dataset: str = 'wikitext2',
calib_samples: int = 128,
calib_seqlen: int = 2048,
search_scale: bool = False,
batch_size: int = 1,
w_bits: int = 8,
dtype: Literal['float16', 'bfloat16', 'auto'] = 'auto',
device: str = 'cuda',
quant_dtype: Literal['int8', 'fp8', 'float8_e4m3fn',
'float8_e5m2'] = 'int8',
revision: str = None,
download_dir: str = None) -> None:
Import
from lmdeploy.lite.apis.smooth_quant import smooth_quant
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | str | Yes | HuggingFace model path or ID |
| work_dir | str | No | Output directory (default: './work_dir') |
| calib_dataset | str | No | Calibration dataset (default: 'wikitext2') |
| calib_samples | int | No | Calibration samples (default: 128) |
| quant_dtype | str | No | Quantization format: 'int8', 'fp8', 'float8_e4m3fn', 'float8_e5m2' (default: 'int8') |
| search_scale | bool | No | Search optimal AWQ-style scaling (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| Quantized model | Files | SmoothQuant model saved to work_dir with quantization_config metadata |
Usage Examples
CLI Quantization
# INT8 SmoothQuant
lmdeploy lite smooth_quant internlm/internlm2_5-7b-chat \
--work-dir ./internlm2_5-7b-w8a8
# FP8 SmoothQuant
lmdeploy lite smooth_quant internlm/internlm2_5-7b-chat \
--work-dir ./internlm2_5-7b-fp8 \
--quant-dtype float8_e4m3fn
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment