Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Sgl project Sglang Sampling Parameters Dict

From Leeroopedia
Revision as of 16:40, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Sgl_project_Sglang_Sampling_Parameters_Dict.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains NLP, Text_Generation, Sampling
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete pattern for constructing sampling parameter dictionaries for SGLang Engine generation calls.

Description

SGLang's Engine.generate accepts sampling parameters as a plain Python dict (or list of dicts for per-request parameters in batch mode). These are validated internally against the GenerateReqInput schema. No special class instantiation is needed — a simple dictionary with string keys suffices.

Usage

Construct a sampling parameters dictionary whenever calling Engine.generate or Engine.async_generate. For batch inference with shared parameters, pass a single dict. For per-request parameters, pass a list of dicts matching the batch size.

Code Reference

Source Location

  • Repository: sglang
  • File: python/sglang/srt/managers/io_struct.py (validation schema)
  • Usage example: examples/runtime/engine/offline_batch_inference.py

Interface Specification

# Sampling parameters are plain Python dictionaries
sampling_params: Dict[str, Any] = {
    "temperature": float,       # Sampling temperature (default: 1.0)
    "top_p": float,             # Nucleus sampling threshold (default: 1.0)
    "top_k": int,               # Top-k sampling (default: -1, disabled)
    "min_p": float,             # Min-p sampling (default: 0.0)
    "max_new_tokens": int,      # Maximum tokens to generate
    "frequency_penalty": float, # Frequency penalty (default: 0.0)
    "presence_penalty": float,  # Presence penalty (default: 0.0)
    "regex": str,               # Constrained decoding regex pattern
    "stop": Union[str, List[str]], # Stop sequences
}

I/O Contract

Inputs

Name Type Required Description
temperature float No Sampling temperature (0.0 = greedy)
top_p float No Nucleus sampling threshold
top_k int No Top-k candidates (-1 = disabled)
max_new_tokens int No Maximum generated tokens
frequency_penalty float No Penalize frequent tokens
presence_penalty float No Penalize already-seen tokens
regex str No Regex constraint for structured generation

Outputs

Name Type Description
sampling_params Dict[str, Any] Dictionary ready for Engine.generate

Usage Examples

Basic Sampling Parameters

# Deterministic generation (greedy decoding)
sampling_params = {"temperature": 0, "max_new_tokens": 128}

# Creative generation with nucleus sampling
sampling_params = {"temperature": 0.8, "top_p": 0.95, "max_new_tokens": 256}

# Constrained decoding with regex
sampling_params = {
    "temperature": 0,
    "max_new_tokens": 64,
    "regex": r"\d{4}-\d{2}-\d{2}",  # Force date format
}

Per-Request Batch Parameters

prompts = ["Write a poem.", "Explain gravity.", "Tell a joke."]
# Different parameters per request
sampling_params_list = [
    {"temperature": 0.9, "max_new_tokens": 200},
    {"temperature": 0.1, "max_new_tokens": 100},
    {"temperature": 0.7, "max_new_tokens": 50},
]
output = engine.generate(prompts, sampling_params_list)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment