Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:InternLM Lmdeploy Smooth Quant

From Leeroopedia


Knowledge Sources
Domains Model_Compression, Quantization
Last Updated 2026-02-07 15:00 GMT

Overview

Concrete tool for applying SmoothQuant W8A8 quantization to language models provided by the LMDeploy library.

Description

The smooth_quant() function and its CLI wrapper lmdeploy lite smooth_quant perform weight-activation co-quantization. It supports both INT8 and FP8 (float8_e4m3fn, float8_e5m2) output formats. The function handles calibration, activation smoothing, and quantized weight export.

Usage

Use this to quantize a model to W8A8 format for INT8 or FP8 inference. The resulting model requires the PyTorch backend.

Code Reference

Source Location

  • Repository: lmdeploy
  • File: lmdeploy/lite/apis/smooth_quant.py
  • Lines: L17-131

Signature

def smooth_quant(model: str,
                 work_dir: str = './work_dir',
                 calib_dataset: str = 'wikitext2',
                 calib_samples: int = 128,
                 calib_seqlen: int = 2048,
                 search_scale: bool = False,
                 batch_size: int = 1,
                 w_bits: int = 8,
                 dtype: Literal['float16', 'bfloat16', 'auto'] = 'auto',
                 device: str = 'cuda',
                 quant_dtype: Literal['int8', 'fp8', 'float8_e4m3fn',
                                      'float8_e5m2'] = 'int8',
                 revision: str = None,
                 download_dir: str = None) -> None:

Import

from lmdeploy.lite.apis.smooth_quant import smooth_quant

I/O Contract

Inputs

Name Type Required Description
model str Yes HuggingFace model path or ID
work_dir str No Output directory (default: './work_dir')
calib_dataset str No Calibration dataset (default: 'wikitext2')
calib_samples int No Calibration samples (default: 128)
quant_dtype str No Quantization format: 'int8', 'fp8', 'float8_e4m3fn', 'float8_e5m2' (default: 'int8')
search_scale bool No Search optimal AWQ-style scaling (default: False)

Outputs

Name Type Description
Quantized model Files SmoothQuant model saved to work_dir with quantization_config metadata

Usage Examples

CLI Quantization

# INT8 SmoothQuant
lmdeploy lite smooth_quant internlm/internlm2_5-7b-chat \
    --work-dir ./internlm2_5-7b-w8a8

# FP8 SmoothQuant
lmdeploy lite smooth_quant internlm/internlm2_5-7b-chat \
    --work-dir ./internlm2_5-7b-fp8 \
    --quant-dtype float8_e4m3fn

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment