Principle:LaurentMazare Tch rs Autoregressive Sampling
| Knowledge Sources | |
|---|---|
| Domains | NLP, Text_Generation |
| Last Updated | 2026-02-08 14:00 GMT |
Overview
Token generation technique where the model predicts the next token by sampling from a probability distribution over the vocabulary, then appends it to the sequence and repeats.
Description
Autoregressive sampling generates text one token at a time. At each step, the model processes the current token sequence and outputs a probability distribution over the vocabulary for the next position. A token is sampled from this distribution (typically after temperature scaling and softmax), appended to the sequence, and the process repeats. Temperature controls the randomness: lower values produce more deterministic output, higher values increase diversity. The generation loop runs under no_grad for memory efficiency.
Usage
Use for open-ended text generation with language models. Control output quality via temperature and optionally top-k/top-p filtering.
Theoretical Basis
Autoregressive Generation:
1. Start with prompt tokens [t_1, ..., t_n]
2. For each new position:
a. Forward pass: logits = model([t_1, ..., t_n]) → [vocab_size]
b. Temperature: logits = logits / temperature
c. Softmax: probs = softmax(logits)
d. Sample: t_{n+1} = multinomial(probs, 1)
e. Append: [t_1, ..., t_n, t_{n+1}]
3. Decode tokens to text via tokenizer
Temperature effect:
T → 0: argmax (greedy, deterministic)
T = 1: sample from model distribution
T > 1: flatter distribution (more random)