Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mit han lab Llm awq Real quantize model weight

From Leeroopedia

Overview

Concrete tool for converting model weights from FP16 to packed INT4 format provided by the llm-awq library.

Source

File: awq/quantize/quantizer.py, Lines: 125-165

Signature

@torch.no_grad()
def real_quantize_model_weight(model, w_bit, q_config, init_only=False):

Import

from awq.quantize.quantizer import real_quantize_model_weight

I/O

Inputs:

  • model (nn.Module) - the model to quantize
  • w_bit (int) - bit width, typically 4
  • q_config (dict) - quantization configuration with zero_point=True, q_group_size=128
  • init_only (bool) - if True, creates empty WQLinear shells without quantizing

Output:

  • None (model modified in-place, all nn.Linear replaced with WQLinear)

Notes

Internally calls pseudo_quantize_tensor to get scales/zeros, then WQLinear.from_linear to create packed modules.

Related Pages

Knowledge Sources

Domains

  • Quantization
  • Model_Compression

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment