Implementation:Vllm project Vllm SamplingParams Init

Knowledge Sources	vLLM vLLM Docs
Domains	Machine Learning, Natural Language Processing, Text Generation
Last Updated	2026-02-08 13:00 GMT

Overview

Concrete tool for configuring text generation sampling hyperparameters provided by vLLM.

Description

SamplingParams is a msgspec.Struct (mixed with PydanticMsgspecMixin) that encapsulates all parameters controlling the token sampling process during text generation. It follows the OpenAI text completion API parameter conventions and extends them with additional controls such as min-p filtering, repetition penalty, and structured output constraints.

Because it is a msgspec.Struct rather than a regular Python class, it benefits from fast serialization, zero-copy deserialization, and strict type validation. The omit_defaults=True setting means only non-default fields are serialized, reducing overhead when transmitting parameters between engine components.

Usage

Import and instantiate SamplingParams before calling LLM.generate() or LLM.chat(). Pass a single instance to apply the same configuration to all prompts, or a list of instances to configure each prompt individually.

Code Reference

Source Location

Repository: vllm
File: vllm/sampling_params.py
Lines: 117-265

Signature

class SamplingParams(
    PydanticMsgspecMixin,
    msgspec.Struct,
    omit_defaults=True,
    dict=True,
):
    n: int = 1
    presence_penalty: float = 0.0
    frequency_penalty: float = 0.0
    repetition_penalty: float = 1.0
    temperature: float = 1.0
    top_p: float = 1.0
    top_k: int = 0
    min_p: float = 0.0
    seed: int | None = None
    stop: str | list[str] | None = None
    stop_token_ids: list[int] | None = None
    ignore_eos: bool = False
    max_tokens: int | None = 16
    min_tokens: int = 0
    logprobs: int | None = None
    prompt_logprobs: int | None = None
    detokenize: bool = True
    skip_special_tokens: bool = True
    spaces_between_special_tokens: bool = True
    logits_processors: Any | None = None
    include_stop_str_in_output: bool = False
    truncate_prompt_tokens: int | None = None
    output_kind: RequestOutputKind = RequestOutputKind.CUMULATIVE
    structured_outputs: StructuredOutputsParams | None = None
    logit_bias: dict[int, float] | None = None
    allowed_token_ids: list[int] | None = None
    bad_words: list[str] | None = None

Import

from vllm import SamplingParams

I/O Contract

Inputs

Name	Type	Required	Description
n	int	No (default: 1)	Number of output sequences to generate per prompt
temperature	float	No (default: 1.0)	Controls randomness. 0 means greedy decoding
top_p	float	No (default: 1.0)	Nucleus sampling cumulative probability threshold, in (0, 1]
top_k	int	No (default: 0)	Number of top tokens to consider. 0 or -1 means all tokens
min_p	float	No (default: 0.0)	Minimum probability relative to the most likely token, in [0, 1]
max_tokens	int or None	No (default: 16)	Maximum number of tokens to generate per output sequence
stop	str, list[str], or None	No (default: None)	Stop string(s) that terminate generation
seed	int or None	No (default: None)	Random seed for reproducible generation
frequency_penalty	float	No (default: 0.0)	Penalizes tokens by their frequency in the generated text
presence_penalty	float	No (default: 0.0)	Penalizes tokens by whether they have appeared at all
repetition_penalty	float	No (default: 1.0)	Multiplicative penalty for repeated tokens. Values > 1 discourage repetition
logprobs	int or None	No (default: None)	Number of log probabilities to return per output token
stop_token_ids	list[int] or None	No (default: None)	Token IDs that terminate generation

Outputs

Name	Type	Description
SamplingParams instance	SamplingParams	A configured sampling parameters object to pass to LLM.generate() or LLM.chat()

Usage Examples

Greedy Decoding

from vllm import SamplingParams

# Greedy decoding: deterministic output
params = SamplingParams(temperature=0, max_tokens=256)

Creative Generation with Nucleus Sampling

from vllm import SamplingParams

# Higher temperature + top-p for diverse outputs
params = SamplingParams(
    temperature=0.8,
    top_p=0.95,
    max_tokens=512,
    stop=["\n\n"],
)

Reproducible Generation with Seed

from vllm import SamplingParams

# Fixed seed for reproducibility
params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=128,
    seed=42,
)

Multiple Outputs per Prompt

from vllm import SamplingParams

# Generate 3 candidate completions per prompt
params = SamplingParams(
    n=3,
    temperature=0.9,
    top_k=50,
    max_tokens=200,
)

Related Pages

Implements Principle

Principle:Vllm_project_Vllm_Sampling_Configuration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment