Principle:Mit han lab Llm awq AWQ Transform Application

Overview

Process of applying precomputed activation-aware scaling and clipping transforms to model weights prior to quantization.

Description

After the AWQ search phase produces optimal per-channel scales and per-group clipping values, these transforms must be applied to the model weights. Scaling is absorbed into the preceding operation (LayerNorm, Linear, or activation function) and corresponding linear layers. Clipping directly constrains weight values. This separation of search and application enables saving/loading AWQ results without re-running the expensive search.

Usage

After loading AWQ search results (from --load_awq checkpoint) and before quantization or evaluation.

Theoretical Basis

Two operations:

apply_scale - Redistributes weight magnitude through equivalent transforms on adjacent layers
apply_clip - Constrains weights to optimal ranges

Both preserve mathematical equivalence while reducing quantization error.

Related Pages

Implementation:Mit_han_lab_Llm_awq_Apply_awq

Knowledge Sources

Paper|AWQ|https://arxiv.org/abs/2306.00978

Domains

Quantization
NLP

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment