Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vllm project Vllm SamplingParams Init

From Leeroopedia


Knowledge Sources
Domains Machine Learning, Natural Language Processing, Text Generation
Last Updated 2026-02-08 13:00 GMT

Overview

Concrete tool for configuring text generation sampling hyperparameters provided by vLLM.

Description

SamplingParams is a msgspec.Struct (mixed with PydanticMsgspecMixin) that encapsulates all parameters controlling the token sampling process during text generation. It follows the OpenAI text completion API parameter conventions and extends them with additional controls such as min-p filtering, repetition penalty, and structured output constraints.

Because it is a msgspec.Struct rather than a regular Python class, it benefits from fast serialization, zero-copy deserialization, and strict type validation. The omit_defaults=True setting means only non-default fields are serialized, reducing overhead when transmitting parameters between engine components.

Usage

Import and instantiate SamplingParams before calling LLM.generate() or LLM.chat(). Pass a single instance to apply the same configuration to all prompts, or a list of instances to configure each prompt individually.

Code Reference

Source Location

  • Repository: vllm
  • File: vllm/sampling_params.py
  • Lines: 117-265

Signature

class SamplingParams(
    PydanticMsgspecMixin,
    msgspec.Struct,
    omit_defaults=True,
    dict=True,
):
    n: int = 1
    presence_penalty: float = 0.0
    frequency_penalty: float = 0.0
    repetition_penalty: float = 1.0
    temperature: float = 1.0
    top_p: float = 1.0
    top_k: int = 0
    min_p: float = 0.0
    seed: int | None = None
    stop: str | list[str] | None = None
    stop_token_ids: list[int] | None = None
    ignore_eos: bool = False
    max_tokens: int | None = 16
    min_tokens: int = 0
    logprobs: int | None = None
    prompt_logprobs: int | None = None
    detokenize: bool = True
    skip_special_tokens: bool = True
    spaces_between_special_tokens: bool = True
    logits_processors: Any | None = None
    include_stop_str_in_output: bool = False
    truncate_prompt_tokens: int | None = None
    output_kind: RequestOutputKind = RequestOutputKind.CUMULATIVE
    structured_outputs: StructuredOutputsParams | None = None
    logit_bias: dict[int, float] | None = None
    allowed_token_ids: list[int] | None = None
    bad_words: list[str] | None = None

Import

from vllm import SamplingParams

I/O Contract

Inputs

Name Type Required Description
n int No (default: 1) Number of output sequences to generate per prompt
temperature float No (default: 1.0) Controls randomness. 0 means greedy decoding
top_p float No (default: 1.0) Nucleus sampling cumulative probability threshold, in (0, 1]
top_k int No (default: 0) Number of top tokens to consider. 0 or -1 means all tokens
min_p float No (default: 0.0) Minimum probability relative to the most likely token, in [0, 1]
max_tokens int or None No (default: 16) Maximum number of tokens to generate per output sequence
stop str, list[str], or None No (default: None) Stop string(s) that terminate generation
seed int or None No (default: None) Random seed for reproducible generation
frequency_penalty float No (default: 0.0) Penalizes tokens by their frequency in the generated text
presence_penalty float No (default: 0.0) Penalizes tokens by whether they have appeared at all
repetition_penalty float No (default: 1.0) Multiplicative penalty for repeated tokens. Values > 1 discourage repetition
logprobs int or None No (default: None) Number of log probabilities to return per output token
stop_token_ids list[int] or None No (default: None) Token IDs that terminate generation

Outputs

Name Type Description
SamplingParams instance SamplingParams A configured sampling parameters object to pass to LLM.generate() or LLM.chat()

Usage Examples

Greedy Decoding

from vllm import SamplingParams

# Greedy decoding: deterministic output
params = SamplingParams(temperature=0, max_tokens=256)

Creative Generation with Nucleus Sampling

from vllm import SamplingParams

# Higher temperature + top-p for diverse outputs
params = SamplingParams(
    temperature=0.8,
    top_p=0.95,
    max_tokens=512,
    stop=["\n\n"],
)

Reproducible Generation with Seed

from vllm import SamplingParams

# Fixed seed for reproducibility
params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=128,
    seed=42,
)

Multiple Outputs per Prompt

from vllm import SamplingParams

# Generate 3 candidate completions per prompt
params = SamplingParams(
    n=3,
    temperature=0.9,
    top_k=50,
    max_tokens=200,
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment