Principle:Vllm project Vllm Output Schema Definition
| Knowledge Sources | |
|---|---|
| Domains | LLM Inference, Structured Output, Schema Definition |
| Last Updated | 2026-02-08 13:00 GMT |
Overview
Output schema definition is the process of specifying a formal constraint on the structure of text that a language model is allowed to generate, ensuring outputs conform to a predetermined format.
Description
When generating text with a large language model, the raw output is unconstrained free-form text. For many applications -- such as extracting structured data, building API responses, or feeding results into downstream systems -- the output must conform to a specific format. Output schema definition addresses this by allowing the user to declare the expected structure before generation begins.
There are several common schema types used in constrained generation:
- JSON Schema: Defines the structure, types, and constraints of a JSON object. Pydantic models provide an ergonomic way to produce JSON Schema via
model_json_schema(). The schema can also be provided as a raw dictionary or JSON string. - Regular Expression (Regex): Defines a pattern that the output text must match character by character. Useful for structured strings like email addresses, dates, or identifiers.
- Context-Free Grammar (GBNF): Defines a formal grammar in GBNF (GGML BNF) notation that constrains the output to valid strings of the grammar. Useful for generating syntactically valid code, SQL, or domain-specific languages.
- Choice List: Defines an explicit enumeration of allowed output strings. The model must produce exactly one of the listed options. Useful for classification tasks.
The schema is defined entirely in user code before being passed into the generation engine. It is independent of the model and inference framework, making it a portable specification of the desired output format.
Usage
Use output schema definition whenever the consumer of model output requires a specific format. Typical scenarios include:
- Extracting structured entities (names, dates, amounts) into JSON objects for database insertion
- Constraining classification outputs to a fixed set of labels
- Generating syntactically valid code or query languages
- Producing outputs that match a known pattern such as email addresses, phone numbers, or URLs
Theoretical Basis
Output schema definition rests on the theory of formal languages. Each schema type corresponds to a class in the Chomsky hierarchy:
- Choice lists correspond to finite languages (a finite set of strings).
- Regular expressions describe regular languages, recognizable by finite automata.
- Context-free grammars (GBNF) describe context-free languages, recognizable by pushdown automata.
- JSON Schema describes a subset of context-free languages with additional semantic constraints (type checking, value ranges, required fields).
During constrained generation, the schema is compiled into an automaton or guide that, at each decoding step, determines which tokens are valid continuations. The schema definition step is therefore the specification of the language that the generation process will be constrained to produce.
The general workflow is:
- Define the target language (schema, regex, grammar, or choice list).
- Compile the language specification into a guide or mask generator.
- At each decoding step, mask logits to exclude tokens that would lead outside the target language.
- Sample from the remaining valid tokens.
Step 1 -- the schema definition -- is what this principle covers. The remaining steps are handled by the constrained generation engine.