Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Mit han lab Llm awq Calibration Data Preparation

From Leeroopedia

Overview

Calibration data preparation is the process of collecting representative input data to guide quantization decisions during post-training quantization (PTQ).

Description

In post-training quantization, the model is quantized without any fine-tuning or retraining. Because there is no gradient-based optimization to recover from quantization errors, the quality of the quantized model depends heavily on understanding which weights matter most to preserve.

Calibration data provides this understanding. By passing representative inputs through the original (unquantized) model, the quantization algorithm observes the activation magnitudes at each layer. These activation statistics reveal which weight channels are "salient" -- that is, which channels carry the most information during inference.

The key insight is that not all weights contribute equally to model output. A small fraction of weights, those connected to high-activation channels, have an outsized impact on model quality. If these salient weights are quantized with the same precision as all other weights, the resulting quantization error degrades model performance significantly. Calibration data makes it possible to identify and protect these critical weights.

The calibration dataset does not need to be large. Typical calibration sets contain only a few hundred samples (e.g., 512 samples in the AWQ default configuration). The dataset should be representative of the distribution the model will encounter at inference time, but it does not need to be drawn from the exact downstream task.

Usage

Calibration data preparation is required when performing any activation-aware quantization method, including:

  • AWQ (Activation-Aware Weight Quantization) -- uses activation magnitudes to determine per-channel scaling factors
  • GPTQ -- uses calibration data to compute Hessian-based weight updates during quantization
  • SmoothQuant -- uses activation statistics to migrate quantization difficulty from activations to weights

In each case, the calibration data serves as the source of activation statistics that inform how quantization should be applied.

Theoretical Basis

The AWQ paper (arXiv:2306.00978) establishes that 0.1% to 1% of weights in a large language model are "salient" and significantly affect model quality when quantized. These salient weights are identified by their corresponding activation magnitudes: weights connected to channels with large activations are disproportionately important.

Formally, for a weight matrix W and input activations X, the importance of each weight channel i is proportional to the mean absolute activation:

importance(i) = mean(|X[:, i]|)

Calibration data provides the input X from which these activation statistics are computed. The resulting importance scores determine the per-channel scaling factors that protect salient weights during quantization.

The quality of the calibration dataset directly affects the accuracy of importance estimation. If the calibration data does not represent the true input distribution, the activation statistics may misidentify which channels are salient, leading to suboptimal quantization decisions.

Related Pages

Knowledge Sources

Domains

  • NLP
  • Quantization

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment