Implementation:Mit han lab Llm awq Wikitext eval loop

Overview

Inline evaluation pattern for computing WikiText-2 perplexity on quantized models in the llm-awq library.

Source

awq/entry.py, Lines 301-333

Doc Type

This is a Pattern Doc: the evaluation is an inline code block in awq/entry.py, not a standalone function.

Pattern

# Load WikiText-2 test data
testenc = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")
testenc = enc("\n\n".join(testenc["text"]), return_tensors="pt")
model.seqlen = 2048
testenc = testenc.input_ids.to(model.device)
nsamples = testenc.numel() // model.seqlen

# Evaluate sliding windows
nlls = []
for i in range(nsamples):
    batch = testenc[:, (i * model.seqlen):((i + 1) * model.seqlen)]
    with torch.no_grad():
        lm_logits = model(batch).logits
    shift_logits = lm_logits[:, :-1, :].contiguous().float()
    shift_labels = testenc[:, (i * model.seqlen):((i + 1) * model.seqlen)][:, 1:]
    loss = nn.CrossEntropyLoss()(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
    nlls.append(loss.float() * model.seqlen)

ppl = torch.exp(torch.stack(nlls).sum() / (nsamples * model.seqlen))

Import

from datasets import load_dataset
import torch.nn as nn

I/O

Inputs:

Quantized model (on GPU) with a .device attribute and forward pass returning .logits
Tokenizer (enc) capable of encoding text to PyTorch tensors

Output:

PPL scalar (float) - the perplexity value
Saved as {"ppl": float} to a JSON results file

How It Works

WikiText-2 test split is loaded and all text is joined with double newlines
The full text is tokenized into a single flat tensor of token IDs
The tensor is split into nsamples non-overlapping windows of 2048 tokens each
For each window, the model's logits are computed via forward pass
Logits are shifted by one position (lm_logits[:, :-1, :]) to align predictions with targets
Cross-entropy loss is computed and scaled by model.seqlen
Final PPL is computed as exp(sum(nlls) / total_tokens)

Related Pages

Knowledge Sources

Repo|llm-awq|https://github.com/mit-han-lab/llm-awq

Domains

NLP
Evaluation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment