Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vllm project Vllm SamplingParams Structured

From Leeroopedia
Revision as of 17:06, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Vllm_project_Vllm_SamplingParams_Structured.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains LLM Inference, Structured Output, Sampling
Last Updated 2026-02-08 13:00 GMT

Overview

Concrete tool for combining structural output constraints with standard sampling hyperparameters into a unified parameter object, provided by vLLM.

Description

The SamplingParams class is vLLM's primary configuration object for text generation. It controls all aspects of the sampling process: temperature, top-p, top-k, penalties, stop conditions, and -- via the structured_outputs field -- structural constraints.

The structured_outputs field accepts a StructuredOutputsParams instance (or None for unconstrained generation). When set, the engine constructs a logits processor that masks invalid tokens at each decoding step according to the specified constraint.

SamplingParams is implemented as a msgspec.Struct for high-performance serialization, with PydanticMsgspecMixin for Pydantic compatibility.

Usage

Use this class to create the complete generation configuration. Construct a StructuredOutputsParams first, then pass it as the structured_outputs keyword argument when constructing SamplingParams.

Code Reference

Source Location

  • Repository: vllm
  • File: vllm/sampling_params.py (lines 117-265; structured_outputs field at line 233)

Signature

class SamplingParams(PydanticMsgspecMixin, msgspec.Struct, omit_defaults=True, dict=True):
    n: int = 1
    presence_penalty: float = 0.0
    frequency_penalty: float = 0.0
    repetition_penalty: float = 1.0
    temperature: float = 1.0
    top_p: float = 1.0
    top_k: int = 0
    min_p: float = 0.0
    seed: int | None = None
    stop: str | list[str] | None = None
    stop_token_ids: list[int] | None = None
    ignore_eos: bool = False
    max_tokens: int | None = 16
    min_tokens: int = 0
    logprobs: int | None = None
    prompt_logprobs: int | None = None
    detokenize: bool = True
    skip_special_tokens: bool = True
    structured_outputs: StructuredOutputsParams | None = None
    logit_bias: dict[int, float] | None = None
    allowed_token_ids: list[int] | None = None
    # ... additional fields

Import

from vllm import SamplingParams
from vllm.sampling_params import StructuredOutputsParams

I/O Contract

Inputs

Name Type Required Description
structured_outputs None No (default: None) Structural constraint configuration; when set, the engine applies logit masking during generation
temperature float No (default: 1.0) Controls randomness; lower values recommended for structured output (e.g., 0.0 for greedy)
max_tokens None No (default: 16) Maximum tokens to generate; set high enough to accommodate the full structured output
top_p float No (default: 1.0) Nucleus sampling threshold; must be in (0, 1]
top_k int No (default: 0) Top-k filtering; 0 or -1 to consider all tokens
stop list[str] | None No (default: None) Stop string(s) that terminate generation
seed None No (default: None) Random seed for reproducible generation
n int No (default: 1) Number of output sequences to generate per prompt

Outputs

Name Type Description
SamplingParams instance SamplingParams A fully configured sampling parameter object ready to be passed to LLM.generate()

Usage Examples

JSON-Constrained Sampling with Low Temperature

from vllm import SamplingParams
from vllm.sampling_params import StructuredOutputsParams
from pydantic import BaseModel

class Answer(BaseModel):
    reasoning: str
    result: int

structured = StructuredOutputsParams(json=Answer.model_json_schema())
sampling_params = SamplingParams(
    structured_outputs=structured,
    temperature=0.1,
    max_tokens=256,
)

Choice-Constrained Sampling

from vllm import SamplingParams
from vllm.sampling_params import StructuredOutputsParams

structured = StructuredOutputsParams(choice=["Positive", "Negative"])
sampling_params = SamplingParams(
    structured_outputs=structured,
)

Regex-Constrained Sampling with Stop Sequence

from vllm import SamplingParams
from vllm.sampling_params import StructuredOutputsParams

structured = StructuredOutputsParams(regex=r"\w+@\w+\.com\n")
sampling_params = SamplingParams(
    structured_outputs=structured,
    stop=["\n"],
    max_tokens=50,
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment