Implementation:Vllm project Vllm SamplingParams Structured
| Knowledge Sources | |
|---|---|
| Domains | LLM Inference, Structured Output, Sampling |
| Last Updated | 2026-02-08 13:00 GMT |
Overview
Concrete tool for combining structural output constraints with standard sampling hyperparameters into a unified parameter object, provided by vLLM.
Description
The SamplingParams class is vLLM's primary configuration object for text generation. It controls all aspects of the sampling process: temperature, top-p, top-k, penalties, stop conditions, and -- via the structured_outputs field -- structural constraints.
The structured_outputs field accepts a StructuredOutputsParams instance (or None for unconstrained generation). When set, the engine constructs a logits processor that masks invalid tokens at each decoding step according to the specified constraint.
SamplingParams is implemented as a msgspec.Struct for high-performance serialization, with PydanticMsgspecMixin for Pydantic compatibility.
Usage
Use this class to create the complete generation configuration. Construct a StructuredOutputsParams first, then pass it as the structured_outputs keyword argument when constructing SamplingParams.
Code Reference
Source Location
- Repository: vllm
- File:
vllm/sampling_params.py(lines 117-265;structured_outputsfield at line 233)
Signature
class SamplingParams(PydanticMsgspecMixin, msgspec.Struct, omit_defaults=True, dict=True):
n: int = 1
presence_penalty: float = 0.0
frequency_penalty: float = 0.0
repetition_penalty: float = 1.0
temperature: float = 1.0
top_p: float = 1.0
top_k: int = 0
min_p: float = 0.0
seed: int | None = None
stop: str | list[str] | None = None
stop_token_ids: list[int] | None = None
ignore_eos: bool = False
max_tokens: int | None = 16
min_tokens: int = 0
logprobs: int | None = None
prompt_logprobs: int | None = None
detokenize: bool = True
skip_special_tokens: bool = True
structured_outputs: StructuredOutputsParams | None = None
logit_bias: dict[int, float] | None = None
allowed_token_ids: list[int] | None = None
# ... additional fields
Import
from vllm import SamplingParams
from vllm.sampling_params import StructuredOutputsParams
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| structured_outputs | None | No (default: None) | Structural constraint configuration; when set, the engine applies logit masking during generation |
| temperature | float |
No (default: 1.0) | Controls randomness; lower values recommended for structured output (e.g., 0.0 for greedy) |
| max_tokens | None | No (default: 16) | Maximum tokens to generate; set high enough to accommodate the full structured output |
| top_p | float |
No (default: 1.0) | Nucleus sampling threshold; must be in (0, 1] |
| top_k | int |
No (default: 0) | Top-k filtering; 0 or -1 to consider all tokens |
| stop | list[str] | None | No (default: None) | Stop string(s) that terminate generation |
| seed | None | No (default: None) | Random seed for reproducible generation |
| n | int |
No (default: 1) | Number of output sequences to generate per prompt |
Outputs
| Name | Type | Description |
|---|---|---|
| SamplingParams instance | SamplingParams |
A fully configured sampling parameter object ready to be passed to LLM.generate()
|
Usage Examples
JSON-Constrained Sampling with Low Temperature
from vllm import SamplingParams
from vllm.sampling_params import StructuredOutputsParams
from pydantic import BaseModel
class Answer(BaseModel):
reasoning: str
result: int
structured = StructuredOutputsParams(json=Answer.model_json_schema())
sampling_params = SamplingParams(
structured_outputs=structured,
temperature=0.1,
max_tokens=256,
)
Choice-Constrained Sampling
from vllm import SamplingParams
from vllm.sampling_params import StructuredOutputsParams
structured = StructuredOutputsParams(choice=["Positive", "Negative"])
sampling_params = SamplingParams(
structured_outputs=structured,
)
Regex-Constrained Sampling with Stop Sequence
from vllm import SamplingParams
from vllm.sampling_params import StructuredOutputsParams
structured = StructuredOutputsParams(regex=r"\w+@\w+\.com\n")
sampling_params = SamplingParams(
structured_outputs=structured,
stop=["\n"],
max_tokens=50,
)