Principle:Mit han lab Llm awq Calibration Data Preparation

Overview

Calibration data preparation is the process of collecting representative input data to guide quantization decisions during post-training quantization (PTQ).

Description

In post-training quantization, the model is quantized without any fine-tuning or retraining. Because there is no gradient-based optimization to recover from quantization errors, the quality of the quantized model depends heavily on understanding which weights matter most to preserve.

Calibration data provides this understanding. By passing representative inputs through the original (unquantized) model, the quantization algorithm observes the activation magnitudes at each layer. These activation statistics reveal which weight channels are "salient" -- that is, which channels carry the most information during inference.

The key insight is that not all weights contribute equally to model output. A small fraction of weights, those connected to high-activation channels, have an outsized impact on model quality. If these salient weights are quantized with the same precision as all other weights, the resulting quantization error degrades model performance significantly. Calibration data makes it possible to identify and protect these critical weights.

The calibration dataset does not need to be large. Typical calibration sets contain only a few hundred samples (e.g., 512 samples in the AWQ default configuration). The dataset should be representative of the distribution the model will encounter at inference time, but it does not need to be drawn from the exact downstream task.

Usage

Calibration data preparation is required when performing any activation-aware quantization method, including:

AWQ (Activation-Aware Weight Quantization) -- uses activation magnitudes to determine per-channel scaling factors
GPTQ -- uses calibration data to compute Hessian-based weight updates during quantization
SmoothQuant -- uses activation statistics to migrate quantization difficulty from activations to weights

In each case, the calibration data serves as the source of activation statistics that inform how quantization should be applied.

Theoretical Basis

The AWQ paper (arXiv:2306.00978) establishes that 0.1% to 1% of weights in a large language model are "salient" and significantly affect model quality when quantized. These salient weights are identified by their corresponding activation magnitudes: weights connected to channels with large activations are disproportionately important.

Formally, for a weight matrix W and input activations X, the importance of each weight channel i is proportional to the mean absolute activation:

importance(i) = mean(|X[:, i]|)

Calibration data provides the input X from which these activation statistics are computed. The resulting importance scores determine the per-channel scaling factors that protect salient weights during quantization.

The quality of the calibration dataset directly affects the accuracy of importance estimation. If the calibration data does not represent the true input distribution, the activation statistics may misidentify which channels are salient, leading to suboptimal quantization decisions.

Related Pages

Implementation:Mit_han_lab_Llm_awq_Get_calib_dataset

Knowledge Sources

Paper|AWQ|https://arxiv.org/abs/2306.00978

Domains

NLP
Quantization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment