Principle:Sgl project Sglang Sampling Parameters Preparation
| Knowledge Sources | |
|---|---|
| Domains | NLP, Text_Generation, Sampling |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
A configuration pattern for specifying text generation sampling strategies including temperature, top-p, top-k, and other decoding parameters.
Description
Sampling parameters control the stochastic behavior of text generation. Temperature scales logits before softmax (higher = more random), top-p (nucleus sampling) truncates the probability distribution to the smallest set of tokens whose cumulative probability exceeds a threshold, and top-k limits candidates to the k highest-probability tokens. Additional parameters control penalties for repetition, early stopping conditions, and maximum output length. In SGLang, these are passed as a plain Python dictionary to the Engine.generate method.
Usage
Define sampling parameters whenever calling Engine.generate or the OpenAI-compatible API to control generation quality, diversity, and length. Use low temperature (0.0-0.3) for factual/deterministic outputs and higher values (0.7-1.0) for creative generation.
Theoretical Basis
Text generation sampling involves selecting the next token from a probability distribution:
Where is the temperature parameter.
Top-p (Nucleus) sampling selects the smallest set such that:
Key parameters:
- temperature — Scaling factor for logits (0 = greedy, 1 = standard sampling)
- top_p — Nucleus sampling threshold (0.0-1.0)
- top_k — Maximum number of candidate tokens
- max_new_tokens — Hard limit on generated token count
- frequency_penalty / presence_penalty — Discourage repetition