Implementation:InternLM Lmdeploy Smooth Quant

Knowledge Sources	LMDeploy W8A8 Quantization
Domains	Model_Compression, Quantization
Last Updated	2026-02-07 15:00 GMT

Overview

Concrete tool for applying SmoothQuant W8A8 quantization to language models provided by the LMDeploy library.

Description

The smooth_quant() function and its CLI wrapper lmdeploy lite smooth_quant perform weight-activation co-quantization. It supports both INT8 and FP8 (float8_e4m3fn, float8_e5m2) output formats. The function handles calibration, activation smoothing, and quantized weight export.

Usage

Use this to quantize a model to W8A8 format for INT8 or FP8 inference. The resulting model requires the PyTorch backend.

Code Reference

Source Location

Repository: lmdeploy
File: lmdeploy/lite/apis/smooth_quant.py
Lines: L17-131

Signature

def smooth_quant(model: str,
                 work_dir: str = './work_dir',
                 calib_dataset: str = 'wikitext2',
                 calib_samples: int = 128,
                 calib_seqlen: int = 2048,
                 search_scale: bool = False,
                 batch_size: int = 1,
                 w_bits: int = 8,
                 dtype: Literal['float16', 'bfloat16', 'auto'] = 'auto',
                 device: str = 'cuda',
                 quant_dtype: Literal['int8', 'fp8', 'float8_e4m3fn',
                                      'float8_e5m2'] = 'int8',
                 revision: str = None,
                 download_dir: str = None) -> None:

Import

from lmdeploy.lite.apis.smooth_quant import smooth_quant

I/O Contract

Inputs

Name	Type	Required	Description
model	str	Yes	HuggingFace model path or ID
work_dir	str	No	Output directory (default: './work_dir')
calib_dataset	str	No	Calibration dataset (default: 'wikitext2')
calib_samples	int	No	Calibration samples (default: 128)
quant_dtype	str	No	Quantization format: 'int8', 'fp8', 'float8_e4m3fn', 'float8_e5m2' (default: 'int8')
search_scale	bool	No	Search optimal AWQ-style scaling (default: False)

Outputs

Name	Type	Description
Quantized model	Files	SmoothQuant model saved to work_dir with quantization_config metadata

Usage Examples

CLI Quantization

# INT8 SmoothQuant
lmdeploy lite smooth_quant internlm/internlm2_5-7b-chat \
    --work-dir ./internlm2_5-7b-w8a8

# FP8 SmoothQuant
lmdeploy lite smooth_quant internlm/internlm2_5-7b-chat \
    --work-dir ./internlm2_5-7b-fp8 \
    --quant-dtype float8_e4m3fn

Related Pages

Implements Principle

Principle:InternLM_Lmdeploy_SmoothQuant_Quantization

Requires Environment

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment