Principle:Mit han lab Llm awq Weight Clipping Optimization
Overview
Optimization technique that clips weight outliers per quantization group to minimize post-quantization output error.
Description
After scaling, some weight values may still cause large quantization errors. Weight clipping shrinks the dynamic range of each quantization group by finding an optimal maximum absolute value. This is done by grid search: for each candidate max_val (from 50% to 100% of the original range in 20 steps), the MSE between the original output and the quantized-clipped output is computed. The clipping value with minimum error is selected. Q/K projections are skipped because QK dot-product makes precise clipping difficult.
Usage
As a sub-step of AWQ search, applied after per-channel scaling.
Theoretical Basis
For each group:
max_val* = argmin_{max_val} ||x·W - x·Q(clip(W, -max_val, max_val))||^2
Grid search over max_val in [0.5 * org_max, org_max] in 20 steps.
Related Pages
- Implementation:Mit_han_lab_Llm_awq_Auto_clip_block
- Heuristic:Mit_han_lab_Llm_awq_AWQ_Grid_Search_Tuning
- Heuristic:Mit_han_lab_Llm_awq_Skip_QK_Projection_Clipping
Knowledge Sources
- Paper|AWQ|https://arxiv.org/abs/2306.00978
Domains
- Quantization
- Optimization