Principle:LaurentMazare Tch rs Autoregressive Sampling

Knowledge Sources	tch-rs
Domains	NLP, Text_Generation
Last Updated	2026-02-08 14:00 GMT

Overview

Token generation technique where the model predicts the next token by sampling from a probability distribution over the vocabulary, then appends it to the sequence and repeats.

Description

Autoregressive sampling generates text one token at a time. At each step, the model processes the current token sequence and outputs a probability distribution over the vocabulary for the next position. A token is sampled from this distribution (typically after temperature scaling and softmax), appended to the sequence, and the process repeats. Temperature controls the randomness: lower values produce more deterministic output, higher values increase diversity. The generation loop runs under no_grad for memory efficiency.

Usage

Use for open-ended text generation with language models. Control output quality via temperature and optionally top-k/top-p filtering.

Theoretical Basis

Autoregressive Generation:
  1. Start with prompt tokens [t_1, ..., t_n]
  2. For each new position:
     a. Forward pass: logits = model([t_1, ..., t_n])  → [vocab_size]
     b. Temperature: logits = logits / temperature
     c. Softmax: probs = softmax(logits)
     d. Sample: t_{n+1} = multinomial(probs, 1)
     e. Append: [t_1, ..., t_n, t_{n+1}]
  3. Decode tokens to text via tokenizer

Temperature effect:
  T → 0: argmax (greedy, deterministic)
  T = 1: sample from model distribution
  T > 1: flatter distribution (more random)

Related Pages

Implemented By

Implementation:LaurentMazare_Tch_rs_Tensor_Multinomial

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment