Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Predibase Lorax Structured JSON Output

From Leeroopedia


Knowledge Sources
Domains LLM_Ops, Inference, Structured_Generation
Last Updated 2026-02-08 03:00 GMT

Overview

End-to-end process for generating structured JSON output from a LoRAX server using schema-constrained decoding powered by the Outlines library.

Description

This workflow covers the process of enforcing that LLM responses consist only of valid JSON adhering to a user-provided JSON schema. Unlike post-hoc validation, LoRAX uses structured generation (constrained decoding) to manipulate token-level logits during inference, guaranteeing that only tokens producing valid JSON can be selected at each generation step. The Outlines library converts the JSON schema into a finite-state machine (FSM) that guides token selection in real-time.

Usage

Execute this workflow when you need guaranteed valid JSON output from the model, such as for data extraction, API response generation, or structured information retrieval. This is supported through both the native LoRAX client and the OpenAI-compatible Chat Completions API. You need a running LoRAX server and either a Pydantic model or raw JSON schema defining the desired output structure.

Execution Steps

Step 1: Schema_Definition

Define the JSON schema that the model output must conform to. This can be done using Pydantic BaseModel classes (which auto-generate JSON schemas) or by providing a raw JSON schema dictionary directly. The schema specifies property names, types, constraints, required fields, and enum values.

Key considerations:

  • Pydantic models provide a convenient way to define schemas with type validation
  • String constraints (maxLength), enum types, and nested objects are supported
  • The schema is converted to a regular expression and then to a finite-state machine

Step 2: Request_Configuration

Configure the inference request with the response_format parameter. Three modes are available: "text" (default plain text), "json_object" (arbitrary valid JSON), and "json_schema" (JSON conforming to a specific schema). The schema can be passed inline with the request.

Available response_format types:

  • text: Standard unstructured text generation (default)
  • json_object: Guarantees valid JSON output without enforcing a specific schema
  • json_schema: Guarantees valid JSON that conforms to the provided schema

Step 3: Constrained_Decoding

During inference, the Outlines-powered FSM is applied at each token generation step. Before sampling the next token, the FSM determines which tokens are valid continuations that maintain JSON validity and schema compliance. Invalid tokens have their logits set to negative infinity, preventing them from being selected.

What happens internally:

  • JSON schema is compiled into a regular expression by Outlines
  • The regex is converted to a deterministic finite-state machine
  • At each decode step, the FSM filters the vocabulary to only valid next tokens
  • Token logits for invalid tokens are masked to negative infinity
  • Sampling proceeds normally over the filtered distribution

Step 4: Response_Parsing

Parse the generated JSON response. Since the output is guaranteed to be valid JSON (assuming sufficient max_new_tokens), it can be directly deserialized. With Pydantic schemas, the response can be validated and typed using the original model class.

Key considerations:

  • Output may be truncated if max_new_tokens is too low (e.g., missing closing braces)
  • Set max_new_tokens high enough to accommodate the full JSON structure
  • Structured generation guarantees form validity but not semantic correctness of values

Execution Diagram

GitHub URL

Workflow Repository