Workflow:Predibase Lorax Structured JSON Output
| Knowledge Sources | |
|---|---|
| Domains | LLM_Ops, Inference, Structured_Generation |
| Last Updated | 2026-02-08 03:00 GMT |
Overview
End-to-end process for generating structured JSON output from a LoRAX server using schema-constrained decoding powered by the Outlines library.
Description
This workflow covers the process of enforcing that LLM responses consist only of valid JSON adhering to a user-provided JSON schema. Unlike post-hoc validation, LoRAX uses structured generation (constrained decoding) to manipulate token-level logits during inference, guaranteeing that only tokens producing valid JSON can be selected at each generation step. The Outlines library converts the JSON schema into a finite-state machine (FSM) that guides token selection in real-time.
Usage
Execute this workflow when you need guaranteed valid JSON output from the model, such as for data extraction, API response generation, or structured information retrieval. This is supported through both the native LoRAX client and the OpenAI-compatible Chat Completions API. You need a running LoRAX server and either a Pydantic model or raw JSON schema defining the desired output structure.
Execution Steps
Step 1: Schema_Definition
Define the JSON schema that the model output must conform to. This can be done using Pydantic BaseModel classes (which auto-generate JSON schemas) or by providing a raw JSON schema dictionary directly. The schema specifies property names, types, constraints, required fields, and enum values.
Key considerations:
- Pydantic models provide a convenient way to define schemas with type validation
- String constraints (maxLength), enum types, and nested objects are supported
- The schema is converted to a regular expression and then to a finite-state machine
Step 2: Request_Configuration
Configure the inference request with the response_format parameter. Three modes are available: "text" (default plain text), "json_object" (arbitrary valid JSON), and "json_schema" (JSON conforming to a specific schema). The schema can be passed inline with the request.
Available response_format types:
- text: Standard unstructured text generation (default)
- json_object: Guarantees valid JSON output without enforcing a specific schema
- json_schema: Guarantees valid JSON that conforms to the provided schema
Step 3: Constrained_Decoding
During inference, the Outlines-powered FSM is applied at each token generation step. Before sampling the next token, the FSM determines which tokens are valid continuations that maintain JSON validity and schema compliance. Invalid tokens have their logits set to negative infinity, preventing them from being selected.
What happens internally:
- JSON schema is compiled into a regular expression by Outlines
- The regex is converted to a deterministic finite-state machine
- At each decode step, the FSM filters the vocabulary to only valid next tokens
- Token logits for invalid tokens are masked to negative infinity
- Sampling proceeds normally over the filtered distribution
Step 4: Response_Parsing
Parse the generated JSON response. Since the output is guaranteed to be valid JSON (assuming sufficient max_new_tokens), it can be directly deserialized. With Pydantic schemas, the response can be validated and typed using the original model class.
Key considerations:
- Output may be truncated if max_new_tokens is too low (e.g., missing closing braces)
- Set max_new_tokens high enough to accommodate the full JSON structure
- Structured generation guarantees form validity but not semantic correctness of values