Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Mit han lab Llm awq WikiText Perplexity Evaluation

From Leeroopedia

Overview

Standardized language modeling evaluation that measures the perplexity of a quantized model on the WikiText-2 test set using sliding window cross-entropy.

Description

Perplexity (PPL) is the primary metric for evaluating language model quality after quantization. The evaluation procedure works as follows:

  • WikiText-2 raw text is loaded and concatenated into a single string
  • The text is tokenized into a flat sequence of token IDs
  • The sequence is split into non-overlapping windows of 2048 tokens
  • For each window, the model computes next-token logits via a forward pass
  • Cross-entropy loss is computed between the shifted logits and the shifted labels
  • Final PPL = exp(average_loss)

Lower PPL indicates better language modeling quality, meaning the model assigns higher probability to the correct next tokens. This is the standard evaluation used in all AWQ/GPTQ/RTN quantization papers, making it the primary metric for comparing quantization methods.

Theoretical Basis

PPL = exp(-1/N * sum(log P(w_i | w_{<i})))

The evaluation uses a sliding window with seqlen=2048. Each window is processed independently with no KV cache carryover between windows. The loss is accumulated across all windows and averaged by the total number of tokens.

Usage

As the primary quality metric when evaluating quantized models (triggered by --tasks wikitext):

  • Load the quantized model onto GPU
  • Run the WikiText-2 evaluation loop
  • Compare the resulting PPL against baseline (FP16) and other quantization methods
  • Typical results: 4-bit AWQ achieves PPL within 0.1-0.5 of the FP16 baseline

Related Pages

Knowledge Sources

Domains

  • NLP
  • Evaluation

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment