Implementation:Sgl project Sglang Sampling Parameters Dict
| Knowledge Sources | |
|---|---|
| Domains | NLP, Text_Generation, Sampling |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete pattern for constructing sampling parameter dictionaries for SGLang Engine generation calls.
Description
SGLang's Engine.generate accepts sampling parameters as a plain Python dict (or list of dicts for per-request parameters in batch mode). These are validated internally against the GenerateReqInput schema. No special class instantiation is needed — a simple dictionary with string keys suffices.
Usage
Construct a sampling parameters dictionary whenever calling Engine.generate or Engine.async_generate. For batch inference with shared parameters, pass a single dict. For per-request parameters, pass a list of dicts matching the batch size.
Code Reference
Source Location
- Repository: sglang
- File: python/sglang/srt/managers/io_struct.py (validation schema)
- Usage example: examples/runtime/engine/offline_batch_inference.py
Interface Specification
# Sampling parameters are plain Python dictionaries
sampling_params: Dict[str, Any] = {
"temperature": float, # Sampling temperature (default: 1.0)
"top_p": float, # Nucleus sampling threshold (default: 1.0)
"top_k": int, # Top-k sampling (default: -1, disabled)
"min_p": float, # Min-p sampling (default: 0.0)
"max_new_tokens": int, # Maximum tokens to generate
"frequency_penalty": float, # Frequency penalty (default: 0.0)
"presence_penalty": float, # Presence penalty (default: 0.0)
"regex": str, # Constrained decoding regex pattern
"stop": Union[str, List[str]], # Stop sequences
}
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| temperature | float | No | Sampling temperature (0.0 = greedy) |
| top_p | float | No | Nucleus sampling threshold |
| top_k | int | No | Top-k candidates (-1 = disabled) |
| max_new_tokens | int | No | Maximum generated tokens |
| frequency_penalty | float | No | Penalize frequent tokens |
| presence_penalty | float | No | Penalize already-seen tokens |
| regex | str | No | Regex constraint for structured generation |
Outputs
| Name | Type | Description |
|---|---|---|
| sampling_params | Dict[str, Any] | Dictionary ready for Engine.generate |
Usage Examples
Basic Sampling Parameters
# Deterministic generation (greedy decoding)
sampling_params = {"temperature": 0, "max_new_tokens": 128}
# Creative generation with nucleus sampling
sampling_params = {"temperature": 0.8, "top_p": 0.95, "max_new_tokens": 256}
# Constrained decoding with regex
sampling_params = {
"temperature": 0,
"max_new_tokens": 64,
"regex": r"\d{4}-\d{2}-\d{2}", # Force date format
}
Per-Request Batch Parameters
prompts = ["Write a poem.", "Explain gravity.", "Tell a joke."]
# Different parameters per request
sampling_params_list = [
{"temperature": 0.9, "max_new_tokens": 200},
{"temperature": 0.1, "max_new_tokens": 100},
{"temperature": 0.7, "max_new_tokens": 50},
]
output = engine.generate(prompts, sampling_params_list)