Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Mit han lab Llm awq Weight Clipping Optimization

From Leeroopedia

Overview

Optimization technique that clips weight outliers per quantization group to minimize post-quantization output error.

Description

After scaling, some weight values may still cause large quantization errors. Weight clipping shrinks the dynamic range of each quantization group by finding an optimal maximum absolute value. This is done by grid search: for each candidate max_val (from 50% to 100% of the original range in 20 steps), the MSE between the original output and the quantized-clipped output is computed. The clipping value with minimum error is selected. Q/K projections are skipped because QK dot-product makes precise clipping difficult.

Usage

As a sub-step of AWQ search, applied after per-channel scaling.

Theoretical Basis

For each group:

max_val* = argmin_{max_val} ||x·W - x·Q(clip(W, -max_val, max_val))||^2

Grid search over max_val in [0.5 * org_max, org_max] in 20 steps.

Related Pages

Knowledge Sources

Domains

  • Quantization
  • Optimization

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment