Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Ggml org Ggml Token Sampling

From Leeroopedia


Template:Principle

Summary

Token Sampling is the process of selecting the next token from a probability distribution over the vocabulary. Various strategies exist to control the trade-off between creativity and coherence in text generation.

Theory

Top-k Sampling

Restrict the candidate set to the k highest-probability tokens, then renormalize the distribution and sample. This prevents extremely low-probability tokens from being selected.

  • Keep only the k tokens with the highest logits
  • Set all other logits to negative infinity (or zero probability)
  • Renormalize the remaining probabilities and sample

Top-p / Nucleus Sampling

Restrict the candidate set to the smallest set of tokens whose cumulative probability is greater than or equal to p, then renormalize and sample.

  • Sort tokens by probability in descending order
  • Keep tokens until the cumulative probability >= p
  • Renormalize the remaining probabilities and sample

Temperature Scaling

Sharpen or flatten the probability distribution by dividing logits by a temperature parameter T before applying softmax.

  • T < 1.0 sharpens the distribution (more deterministic)
  • T = 1.0 leaves the distribution unchanged
  • T > 1.0 flattens the distribution (more random / creative)

Math

Softmax with Temperature

Given logits zi and temperature T, the probability of token xi is:

P(x_i) = exp(logit_i / T) / sum(exp(logit_j / T))

Top-k Filter

Given the set of all tokens V and parameter k:

V_k = { x_i in V : rank(x_i) <= k }  (sorted by descending logit)

Renormalize: P(xi)=P(xi)/xjVkP(xj)

Top-p Filter

Given sorted probabilities p1p2 and threshold p:

V_p = { x_1, x_2, ..., x_m }  where m is the smallest index such that sum(p_1..p_m) >= p

Renormalize over Vp.

Trade-offs

  • Temperature controls creativity vs. coherence: low temperature yields safe, repetitive text; high temperature yields diverse but potentially incoherent text
  • Top-k prevents degenerate outputs by excluding the long tail, but uses a fixed cutoff regardless of distribution shape
  • Top-p adapts the cutoff to the distribution shape, keeping more tokens when the distribution is flat and fewer when it is peaked
  • Combining top-k and top-p provides robust sampling across diverse generation contexts

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment