Principle:Ollama Ollama Sampling Strategy

Knowledge Sources	Ollama Nucleus Sampling Temperature Sampling
Domains	Sampling, Stochastic Methods
Last Updated	2025-02-15 00:00 GMT

Overview

Sampling Strategies define the individual algorithms used to transform raw model logits into a filtered probability distribution, including temperature scaling, top-k filtering, top-p (nucleus) sampling, and min-p thresholding, each controlling a different aspect of output diversity and quality.

Core Concepts

Temperature Scaling

Temperature divides each logit by a scalar T before softmax normalization. When T < 1, the distribution becomes sharper (more peaked), favoring high-probability tokens. When T > 1, the distribution becomes flatter, increasing the probability mass on lower-ranked tokens. At the extreme, T approaching 0 converges to greedy (argmax) selection, while very high T approaches uniform random selection across the vocabulary.

Formally: $P (x_{i}) = \frac{e^{z_{i} / T}}{\sum_{j} e^{z_{j} / T}}$

Top-K Filtering

Top-k filtering retains only the k tokens with the highest logits and discards the rest. This provides a hard upper bound on the number of candidate tokens considered during sampling. It is applied before temperature scaling and softmax to reduce the computation required for subsequent stages. When k equals the vocabulary size or is set to zero, this filter has no effect.

Top-P (Nucleus) Sampling

Top-p sampling dynamically selects the smallest set of tokens whose cumulative probability exceeds a threshold p (typically 0.9 or 0.95). Unlike top-k, which uses a fixed count, top-p adapts to the shape of the distribution: when the model is confident, fewer tokens are retained; when the model is uncertain, more tokens are included. This provides a more natural truncation of the probability tail.

Min-P Thresholding

Min-p removes any token whose probability is below a fraction of the maximum token's probability. Specifically, a token is kept only if Failed to parse (syntax error): {\displaystyle P(x_i) \geq \text{min\_p} \times P_{\max}} . This acts as a relative threshold that adapts to the overall confidence level, pruning only tokens that are negligibly likely relative to the best candidate.

Interaction and Ordering

The strategies are applied in a specific order: top-k first (operating on raw logits), then temperature and softmax (converting to probabilities), then top-p and min-p (operating on probabilities). This ordering matters because top-k on logits is more numerically stable and computationally efficient, while top-p and min-p require normalized probabilities to function correctly.

Implementation Notes

Each sampling strategy is implemented as a standalone function in sample/transforms.go: topK sorts tokens by logit descending and truncates to k, temperature divides logit values by T, softmax converts logits to probabilities, topP accumulates probability mass and truncates, and minP filters by relative probability threshold. These functions are composed by the Sampler.sample method in sample/samplers.go.

Related Pages

Implementation:Ollama_Ollama_Sampler_Sample

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment