Implementation:InternLM Lmdeploy Auto Awq
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Model_Compression, Quantization |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Concrete tool for applying AWQ 4-bit weight quantization to language models provided by the LMDeploy library.
Description
The auto_awq() function and its CLI wrapper lmdeploy lite auto_awq perform activation-aware weight quantization on a model. The function handles calibration data loading, activation statistic collection, and weight quantization in a single pipeline.
Usage
Use this to compress a full-precision model to 4-bit AWQ format. Run as a CLI command for simple workflows or call programmatically for automation.
Code Reference
Source Location
- Repository: lmdeploy
- File: lmdeploy/lite/apis/auto_awq.py
- Lines: L41-129
Signature
def auto_awq(model: str,
work_dir: str = './work_dir',
calib_dataset: str = 'wikitext2',
calib_samples: int = 128,
calib_seqlen: int = 2048,
batch_size: int = 1,
w_bits: int = 4,
w_sym: bool = False,
w_group_size: int = 128,
search_scale: bool = False,
device: str = 'cuda',
revision: str = None,
dtype: Literal['float16', 'bfloat16', 'auto'] = 'auto',
download_dir: str = None) -> None:
Import
from lmdeploy.lite.apis.auto_awq import auto_awq
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | str | Yes | HuggingFace model path or ID |
| work_dir | str | No | Output directory (default: './work_dir') |
| calib_dataset | str | No | Calibration dataset (default: 'wikitext2') |
| calib_samples | int | No | Number of calibration samples (default: 128) |
| calib_seqlen | int | No | Calibration sequence length (default: 2048) |
| w_bits | int | No | Quantization bit width (default: 4) |
| w_group_size | int | No | Quantization group size (default: 128) |
| search_scale | bool | No | Search optimal scale ratios (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| Quantized model | Files | AWQ model saved to work_dir with quantization_config metadata |
Usage Examples
CLI Quantization
# Basic AWQ quantization
lmdeploy lite auto_awq internlm/internlm2_5-7b-chat \
--work-dir ./internlm2_5-7b-4bit
# With custom calibration
lmdeploy lite auto_awq internlm/internlm2_5-7b-chat \
--work-dir ./internlm2_5-7b-4bit \
--calib-samples 256 \
--search-scale
Python Usage
from lmdeploy.lite.apis.auto_awq import auto_awq
auto_awq(
model='internlm/internlm2_5-7b-chat',
work_dir='./quantized_model',
calib_dataset='wikitext2',
calib_samples=128,
w_bits=4,
w_group_size=128
)
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment