Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Romsto Speculative Decoding Logits Processing

From Leeroopedia
Knowledge Sources
Domains NLP, Sampling, Probability_Theory
Last Updated 2026-02-14 04:30 GMT

Overview

A family of token sampling strategies that transform raw model logits into probability distributions and select tokens, including greedy, multinomial, top-k, nucleus (top-p), and combined top-k/nucleus methods.

Description

Logits Processing encompasses the techniques used to convert a language model's raw output logits into a probability distribution and then sample a token from that distribution. The choice of sampling strategy profoundly affects the quality, diversity, and coherence of generated text.

The key strategies are:

  • Greedy decoding: Always selects the highest-probability token. Deterministic but can lead to repetitive, degenerate text.
  • Multinomial sampling: Samples proportionally from the full distribution scaled by a temperature parameter. Higher temperature increases diversity.
  • Top-k sampling: Restricts the candidate set to the k highest-probability tokens before sampling. Prevents sampling from the long tail of unlikely tokens.
  • Nucleus (top-p) sampling: Dynamically selects the smallest set of tokens whose cumulative probability exceeds threshold p. Adapts the candidate set size based on the distribution's entropy.
  • Top-k + Nucleus: Applies top-k filtering first, then nucleus filtering, combining both truncation methods.

All strategies share a common interface: they accept logits, apply temperature-scaled softmax, optionally filter low-probability tokens, and then sample from the resulting distribution.

Usage

Use this principle when generating text from a language model and need to control the trade-off between output quality and diversity. Greedy decoding is appropriate for tasks requiring deterministic output (e.g., factual Q&A). Nucleus sampling is preferred for creative text generation where diversity is valued. The choice of strategy also affects speculative decoding: both the drafter and target models must use the same sampling strategy for correct rejection sampling.

Theoretical Basis

All logits processors follow a two-stage pipeline:

  1. Process: Transform raw logits (optionally filtering low-probability tokens)
  2. Sample: Convert processed logits to probabilities via temperature-scaled softmax, then select a token

probs=softmax(process(logits)T)

Where T is the temperature parameter.

Top-k filtering sets all logits below the k-th highest value to :

# Abstract top-k filtering
threshold = sorted(logits, descending=True)[k]
logits[logits < threshold] = -inf

Nucleus filtering finds the smallest set of tokens with cumulative probability >= p:

# Abstract nucleus filtering
sorted_probs = sort(softmax(logits), descending=True)
cumulative = cumsum(sorted_probs)
mask = cumulative > top_p
logits[mask] = -inf  # after restoring original order

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment