Principle:Predibase Lorax Constrained Decoding
| Knowledge Sources | |
|---|---|
| Domains | Structured_Output, Text_Generation |
| Last Updated | 2026-02-08 02:00 GMT |
Overview
A token-level generation constraint mechanism that uses finite state machines (FSMs) compiled from JSON schemas to mask invalid tokens at each decoding step, guaranteeing structurally valid output.
Description
Constrained Decoding solves the fundamental problem of getting language models to produce reliably structured output. Instead of hoping the model follows a JSON format, the FSM mathematically guarantees it.
The process:
- Schema → Regex: Convert JSON Schema to a regular expression that matches all valid JSON strings conforming to the schema
- Regex → FSM: Compile the regex into a finite state machine using the Outlines library
- FSM → Token Mask: At each generation step, query the FSM for allowed tokens given the current state
- Mask → Constrained Scores: Set logits of disallowed tokens to negative infinity, forcing the model to only produce valid tokens
This approach has zero impact on output quality for tokens that are already valid, and minimal impact on generation speed (FSM compilation is cached).
Usage
Use when you need guaranteed valid JSON output. The constraint is applied transparently when response_format is specified. Works with both the LoRAX native API and the OpenAI-compatible chat API.
Theoretical Basis
Failed to parse (unknown function "\begin{cases}"): {\displaystyle P_{constrained}(t_i | t_{<i}) = \begin{cases} \frac{P(t_i | t_{<i})}{\sum_{t \in \text{allowed}} P(t | t_{<i})} & \text{if } t_i \in \text{FSM.allowed\_tokens}(s_i) \\ 0 & \text{otherwise} \end{cases} }
Where s_i is the FSM state after processing tokens t_1, ..., t_{i-1}.
Pseudo-code:
# Constrained decoding at each step
regex = json_schema_to_regex(schema)
fsm = compile_fsm(regex, tokenizer_vocabulary)
state = 0 # initial state
for step in generation:
allowed_tokens = fsm.get_next_instruction(state).tokens
scores[~allowed_tokens] = -inf # mask invalid tokens
next_token = sample(scores)
state = fsm.get_next_state(state, next_token)