Principle:Mit han lab Llm awq Weight Clipping Optimization

Overview

Optimization technique that clips weight outliers per quantization group to minimize post-quantization output error.

Description

After scaling, some weight values may still cause large quantization errors. Weight clipping shrinks the dynamic range of each quantization group by finding an optimal maximum absolute value. This is done by grid search: for each candidate max_val (from 50% to 100% of the original range in 20 steps), the MSE between the original output and the quantized-clipped output is computed. The clipping value with minimum error is selected. Q/K projections are skipped because QK dot-product makes precise clipping difficult.

Usage

As a sub-step of AWQ search, applied after per-channel scaling.

Theoretical Basis

For each group:

max_val* = argmin_{max_val} ||x·W - x·Q(clip(W, -max_val, max_val))||^2

Grid search over max_val in [0.5 * org_max, org_max] in 20 steps.

Related Pages

Knowledge Sources

Paper|AWQ|https://arxiv.org/abs/2306.00978

Domains

Quantization
Optimization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment