Implementation:Mit han lab Llm awq Pseudo quantize model weight

Overview

Concrete tool for applying simulated quantization to all model weights provided by the llm-awq library.

File: awq/quantize/quantizer.py, Lines: 106-123

@torch.no_grad()
def pseudo_quantize_model_weight(model, w_bit, q_config):

from awq.quantize.quantizer import pseudo_quantize_model_weight

Inputs:

Output:

Iterates over all transformer blocks and applies pseudo_quantize_tensor to each linear layer's weights.

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment