Implementation:Mit han lab Llm awq Real quantize model weight

Overview

Concrete tool for converting model weights from FP16 to packed INT4 format provided by the llm-awq library.

File: awq/quantize/quantizer.py, Lines: 125-165

@torch.no_grad()
def real_quantize_model_weight(model, w_bit, q_config, init_only=False):

from awq.quantize.quantizer import real_quantize_model_weight

Inputs:

model (nn.Module) - the model to quantize
w_bit (int) - bit width, typically 4
q_config (dict) - quantization configuration with zero_point=True, q_group_size=128
init_only (bool) - if True, creates empty WQLinear shells without quantizing

Output:

Internally calls pseudo_quantize_tensor to get scales/zeros, then WQLinear.from_linear to create packed modules.

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment