Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:InternLM Lmdeploy Auto Awq

From Leeroopedia


Knowledge Sources
Domains Model_Compression, Quantization
Last Updated 2026-02-07 15:00 GMT

Overview

Concrete tool for applying AWQ 4-bit weight quantization to language models provided by the LMDeploy library.

Description

The auto_awq() function and its CLI wrapper lmdeploy lite auto_awq perform activation-aware weight quantization on a model. The function handles calibration data loading, activation statistic collection, and weight quantization in a single pipeline.

Usage

Use this to compress a full-precision model to 4-bit AWQ format. Run as a CLI command for simple workflows or call programmatically for automation.

Code Reference

Source Location

  • Repository: lmdeploy
  • File: lmdeploy/lite/apis/auto_awq.py
  • Lines: L41-129

Signature

def auto_awq(model: str,
             work_dir: str = './work_dir',
             calib_dataset: str = 'wikitext2',
             calib_samples: int = 128,
             calib_seqlen: int = 2048,
             batch_size: int = 1,
             w_bits: int = 4,
             w_sym: bool = False,
             w_group_size: int = 128,
             search_scale: bool = False,
             device: str = 'cuda',
             revision: str = None,
             dtype: Literal['float16', 'bfloat16', 'auto'] = 'auto',
             download_dir: str = None) -> None:

Import

from lmdeploy.lite.apis.auto_awq import auto_awq

I/O Contract

Inputs

Name Type Required Description
model str Yes HuggingFace model path or ID
work_dir str No Output directory (default: './work_dir')
calib_dataset str No Calibration dataset (default: 'wikitext2')
calib_samples int No Number of calibration samples (default: 128)
calib_seqlen int No Calibration sequence length (default: 2048)
w_bits int No Quantization bit width (default: 4)
w_group_size int No Quantization group size (default: 128)
search_scale bool No Search optimal scale ratios (default: False)

Outputs

Name Type Description
Quantized model Files AWQ model saved to work_dir with quantization_config metadata

Usage Examples

CLI Quantization

# Basic AWQ quantization
lmdeploy lite auto_awq internlm/internlm2_5-7b-chat \
    --work-dir ./internlm2_5-7b-4bit

# With custom calibration
lmdeploy lite auto_awq internlm/internlm2_5-7b-chat \
    --work-dir ./internlm2_5-7b-4bit \
    --calib-samples 256 \
    --search-scale

Python Usage

from lmdeploy.lite.apis.auto_awq import auto_awq

auto_awq(
    model='internlm/internlm2_5-7b-chat',
    work_dir='./quantized_model',
    calib_dataset='wikitext2',
    calib_samples=128,
    w_bits=4,
    w_group_size=128
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment