Implementation:Sgl project Sglang Sampling Parameters Dict

Knowledge Sources	SGLang
Domains	NLP, Text_Generation, Sampling
Last Updated	2026-02-10 00:00 GMT

Overview

Concrete pattern for constructing sampling parameter dictionaries for SGLang Engine generation calls.

Description

SGLang's Engine.generate accepts sampling parameters as a plain Python dict (or list of dicts for per-request parameters in batch mode). These are validated internally against the GenerateReqInput schema. No special class instantiation is needed — a simple dictionary with string keys suffices.

Usage

Construct a sampling parameters dictionary whenever calling Engine.generate or Engine.async_generate. For batch inference with shared parameters, pass a single dict. For per-request parameters, pass a list of dicts matching the batch size.

Code Reference

Source Location

Repository: sglang
File: python/sglang/srt/managers/io_struct.py (validation schema)
Usage example: examples/runtime/engine/offline_batch_inference.py

Interface Specification

# Sampling parameters are plain Python dictionaries
sampling_params: Dict[str, Any] = {
    "temperature": float,       # Sampling temperature (default: 1.0)
    "top_p": float,             # Nucleus sampling threshold (default: 1.0)
    "top_k": int,               # Top-k sampling (default: -1, disabled)
    "min_p": float,             # Min-p sampling (default: 0.0)
    "max_new_tokens": int,      # Maximum tokens to generate
    "frequency_penalty": float, # Frequency penalty (default: 0.0)
    "presence_penalty": float,  # Presence penalty (default: 0.0)
    "regex": str,               # Constrained decoding regex pattern
    "stop": Union[str, List[str]], # Stop sequences
}

I/O Contract

Inputs

Name	Type	Required	Description
temperature	float	No	Sampling temperature (0.0 = greedy)
top_p	float	No	Nucleus sampling threshold
top_k	int	No	Top-k candidates (-1 = disabled)
max_new_tokens	int	No	Maximum generated tokens
frequency_penalty	float	No	Penalize frequent tokens
presence_penalty	float	No	Penalize already-seen tokens
regex	str	No	Regex constraint for structured generation

Outputs

Name	Type	Description
sampling_params	Dict[str, Any]	Dictionary ready for Engine.generate

Usage Examples

Basic Sampling Parameters

# Deterministic generation (greedy decoding)
sampling_params = {"temperature": 0, "max_new_tokens": 128}

# Creative generation with nucleus sampling
sampling_params = {"temperature": 0.8, "top_p": 0.95, "max_new_tokens": 256}

# Constrained decoding with regex
sampling_params = {
    "temperature": 0,
    "max_new_tokens": 64,
    "regex": r"\d{4}-\d{2}-\d{2}",  # Force date format
}

Per-Request Batch Parameters

prompts = ["Write a poem.", "Explain gravity.", "Tell a joke."]
# Different parameters per request
sampling_params_list = [
    {"temperature": 0.9, "max_new_tokens": 200},
    {"temperature": 0.1, "max_new_tokens": 100},
    {"temperature": 0.7, "max_new_tokens": 50},
]
output = engine.generate(prompts, sampling_params_list)

Related Pages

Implements Principle

Principle:Sgl_project_Sglang_Sampling_Parameters_Preparation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment