Principle:Ollama Ollama Token Sampling
| Knowledge Sources | |
|---|---|
| Domains | NLP, Probability, Inference |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
A configurable token selection mechanism that transforms raw model logits into a probability distribution and samples the next token using temperature scaling, top-k filtering, top-p (nucleus) sampling, and min-p thresholding.
Description
Token Sampling is the core decoding step in autoregressive language model inference. After the model produces a logit vector over the entire vocabulary, the sampler applies a pipeline of transforms to select the next token. This pipeline controls the tradeoff between coherence (low temperature, greedy) and creativity (high temperature, diverse sampling).
The sampling pipeline supports:
- Temperature scaling: Divides logits by temperature before softmax, controlling distribution sharpness.
- Top-k filtering: Retains only the k highest-probability tokens.
- Top-p (nucleus) sampling: Retains the smallest set of tokens whose cumulative probability exceeds p.
- Min-p thresholding: Removes tokens with probability below min_p times the maximum probability.
- Grammar-constrained sampling: Optionally applies a BNF grammar to mask tokens that would produce invalid output (e.g., for JSON generation).
Usage
Use this principle in any autoregressive text generation system where controllable diversity is needed. The sampling parameters (temperature, top_k, top_p, min_p, seed) are typically exposed as user-facing API options.
Theoretical Basis
The sampling pipeline processes logits through sequential transforms:
Top-k: Sort tokens by logit, keep only the top k.
Top-p: After softmax, accumulate probabilities from highest to lowest; keep tokens until cumulative probability exceeds p.
Min-p: Remove any token with probability below Failed to parse (syntax error): {\displaystyle \text{min\_p} \times P_{\max}} .
Greedy (T=0): Return argmax(logits) directly, skipping all stochastic transforms.
Grammar Masking: Before sampling, set logits to -∞ for tokens that would violate the grammar state.