Implementation:Vllm project Vllm SamplingParams Init
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Natural Language Processing, Text Generation |
| Last Updated | 2026-02-08 13:00 GMT |
Overview
Concrete tool for configuring text generation sampling hyperparameters provided by vLLM.
Description
SamplingParams is a msgspec.Struct (mixed with PydanticMsgspecMixin) that encapsulates all parameters controlling the token sampling process during text generation. It follows the OpenAI text completion API parameter conventions and extends them with additional controls such as min-p filtering, repetition penalty, and structured output constraints.
Because it is a msgspec.Struct rather than a regular Python class, it benefits from fast serialization, zero-copy deserialization, and strict type validation. The omit_defaults=True setting means only non-default fields are serialized, reducing overhead when transmitting parameters between engine components.
Usage
Import and instantiate SamplingParams before calling LLM.generate() or LLM.chat(). Pass a single instance to apply the same configuration to all prompts, or a list of instances to configure each prompt individually.
Code Reference
Source Location
- Repository: vllm
- File: vllm/sampling_params.py
- Lines: 117-265
Signature
class SamplingParams(
PydanticMsgspecMixin,
msgspec.Struct,
omit_defaults=True,
dict=True,
):
n: int = 1
presence_penalty: float = 0.0
frequency_penalty: float = 0.0
repetition_penalty: float = 1.0
temperature: float = 1.0
top_p: float = 1.0
top_k: int = 0
min_p: float = 0.0
seed: int | None = None
stop: str | list[str] | None = None
stop_token_ids: list[int] | None = None
ignore_eos: bool = False
max_tokens: int | None = 16
min_tokens: int = 0
logprobs: int | None = None
prompt_logprobs: int | None = None
detokenize: bool = True
skip_special_tokens: bool = True
spaces_between_special_tokens: bool = True
logits_processors: Any | None = None
include_stop_str_in_output: bool = False
truncate_prompt_tokens: int | None = None
output_kind: RequestOutputKind = RequestOutputKind.CUMULATIVE
structured_outputs: StructuredOutputsParams | None = None
logit_bias: dict[int, float] | None = None
allowed_token_ids: list[int] | None = None
bad_words: list[str] | None = None
Import
from vllm import SamplingParams
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| n | int | No (default: 1) | Number of output sequences to generate per prompt |
| temperature | float | No (default: 1.0) | Controls randomness. 0 means greedy decoding |
| top_p | float | No (default: 1.0) | Nucleus sampling cumulative probability threshold, in (0, 1] |
| top_k | int | No (default: 0) | Number of top tokens to consider. 0 or -1 means all tokens |
| min_p | float | No (default: 0.0) | Minimum probability relative to the most likely token, in [0, 1] |
| max_tokens | int or None | No (default: 16) | Maximum number of tokens to generate per output sequence |
| stop | str, list[str], or None | No (default: None) | Stop string(s) that terminate generation |
| seed | int or None | No (default: None) | Random seed for reproducible generation |
| frequency_penalty | float | No (default: 0.0) | Penalizes tokens by their frequency in the generated text |
| presence_penalty | float | No (default: 0.0) | Penalizes tokens by whether they have appeared at all |
| repetition_penalty | float | No (default: 1.0) | Multiplicative penalty for repeated tokens. Values > 1 discourage repetition |
| logprobs | int or None | No (default: None) | Number of log probabilities to return per output token |
| stop_token_ids | list[int] or None | No (default: None) | Token IDs that terminate generation |
Outputs
| Name | Type | Description |
|---|---|---|
| SamplingParams instance | SamplingParams | A configured sampling parameters object to pass to LLM.generate() or LLM.chat() |
Usage Examples
Greedy Decoding
from vllm import SamplingParams
# Greedy decoding: deterministic output
params = SamplingParams(temperature=0, max_tokens=256)
Creative Generation with Nucleus Sampling
from vllm import SamplingParams
# Higher temperature + top-p for diverse outputs
params = SamplingParams(
temperature=0.8,
top_p=0.95,
max_tokens=512,
stop=["\n\n"],
)
Reproducible Generation with Seed
from vllm import SamplingParams
# Fixed seed for reproducibility
params = SamplingParams(
temperature=0.7,
top_p=0.9,
max_tokens=128,
seed=42,
)
Multiple Outputs per Prompt
from vllm import SamplingParams
# Generate 3 candidate completions per prompt
params = SamplingParams(
n=3,
temperature=0.9,
top_k=50,
max_tokens=200,
)