Principle:Ollama Ollama Sampling Strategy
| Knowledge Sources | |
|---|---|
| Domains | Sampling, Stochastic Methods |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Sampling Strategies define the individual algorithms used to transform raw model logits into a filtered probability distribution, including temperature scaling, top-k filtering, top-p (nucleus) sampling, and min-p thresholding, each controlling a different aspect of output diversity and quality.
Core Concepts
Temperature Scaling
Temperature divides each logit by a scalar T before softmax normalization. When T < 1, the distribution becomes sharper (more peaked), favoring high-probability tokens. When T > 1, the distribution becomes flatter, increasing the probability mass on lower-ranked tokens. At the extreme, T approaching 0 converges to greedy (argmax) selection, while very high T approaches uniform random selection across the vocabulary.
Formally:
Top-K Filtering
Top-k filtering retains only the k tokens with the highest logits and discards the rest. This provides a hard upper bound on the number of candidate tokens considered during sampling. It is applied before temperature scaling and softmax to reduce the computation required for subsequent stages. When k equals the vocabulary size or is set to zero, this filter has no effect.
Top-P (Nucleus) Sampling
Top-p sampling dynamically selects the smallest set of tokens whose cumulative probability exceeds a threshold p (typically 0.9 or 0.95). Unlike top-k, which uses a fixed count, top-p adapts to the shape of the distribution: when the model is confident, fewer tokens are retained; when the model is uncertain, more tokens are included. This provides a more natural truncation of the probability tail.
Min-P Thresholding
Min-p removes any token whose probability is below a fraction of the maximum token's probability. Specifically, a token is kept only if Failed to parse (syntax error): {\displaystyle P(x_i) \geq \text{min\_p} \times P_{\max}} . This acts as a relative threshold that adapts to the overall confidence level, pruning only tokens that are negligibly likely relative to the best candidate.
Interaction and Ordering
The strategies are applied in a specific order: top-k first (operating on raw logits), then temperature and softmax (converting to probabilities), then top-p and min-p (operating on probabilities). This ordering matters because top-k on logits is more numerically stable and computationally efficient, while top-p and min-p require normalized probabilities to function correctly.
Implementation Notes
Each sampling strategy is implemented as a standalone function in sample/transforms.go: topK sorts tokens by logit descending and truncates to k, temperature divides logit values by T, softmax converts logits to probabilities, topP accumulates probability mass and truncates, and minP filters by relative probability threshold. These functions are composed by the Sampler.sample method in sample/samplers.go.