Implementation:Predibase Lorax Outlines Logits Processor
| Knowledge Sources | |
|---|---|
| Domains | Structured_Output, Text_Generation |
| Last Updated | 2026-02-08 02:00 GMT |
Overview
Concrete tool for FSM-based JSON schema enforcement during token generation provided by the OutlinesLogitsProcessor and HeterogeneousSchemaLogitsProcessor classes.
Description
The OutlinesLogitsProcessor compiles a JSON schema into a finite state machine (FSM) using the Outlines library's build_regex_from_schema and RegexGuide.from_regex functions. At each generation step, its __call__ method queries the FSM for allowed tokens and masks all others to -inf. The FSM state advances with each generated token via next_state().
The HeterogeneousSchemaLogitsProcessor wraps multiple OutlinesLogitsProcessor instances to handle batched requests where different items may have different schemas (or no schema).
Both FSM compilation and tokenizer adaptation are cached via @lru_cache for performance.
Usage
Used internally during token generation when a schema constraint is active. Not called directly by users. The ToolGrammar::apply() function in Rust handles function-calling tool grammars.
Code Reference
Source Location
- Repository: LoRAX
- File: server/lorax_server/utils/logits_process.py
- Lines: 532-601 (OutlinesLogitsProcessor), 464-528 (HeterogeneousSchemaLogitsProcessor)
Signature
class OutlinesLogitsProcessor(LogitsProcessor):
def __init__(self, schema: str, tokenizer: PreTrainedTokenizerBase):
"""
Compile the FSM from JSON schema.
Args:
schema: JSON schema string
tokenizer: HF tokenizer for vocabulary mapping
"""
self.tokenizer = OutlinesLogitsProcessor.adapt_tokenizer(tokenizer)
self.fsm = OutlinesLogitsProcessor.compile_fsm(schema, self.tokenizer)
self.fsm_state = 0
def __call__(self, scores: torch.Tensor) -> torch.Tensor:
"""Apply FSM constraint to logit scores."""
allowed_tokens = self.fsm.get_next_instruction(self.fsm_state).tokens
mask = torch.full_like(scores, -math.inf)
mask[:, allowed_tokens] = 0
return scores + mask
def next_state(self, next_token_id: int):
"""Advance FSM state after token selection."""
self.fsm_state = self.fsm.get_next_state(self.fsm_state, next_token_id)
@staticmethod
@lru_cache(maxsize=32, typed=True)
def compile_fsm(schema, tokenizer):
regex_string = build_regex_from_schema(schema)
return RegexGuide.from_regex(regex_string, tokenizer)
class HeterogeneousSchemaLogitsProcessor(LogitsProcessor):
def __init__(self, sequence_processors: List[Optional[OutlinesLogitsProcessor]]):
"""Handle batched requests with different schemas."""
@classmethod
def from_schemas(
cls,
schemas: List[Optional[str]],
tokenizers: List[Optional[PreTrainedTokenizerBase]],
) -> "HeterogeneousSchemaLogitsProcessor":
"""Create from lists of schemas and tokenizers."""
Import
from lorax_server.utils.logits_process import (
OutlinesLogitsProcessor,
HeterogeneousSchemaLogitsProcessor,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| schema | str | Yes | JSON schema string for FSM compilation |
| tokenizer | PreTrainedTokenizerBase | Yes | HF tokenizer for vocabulary mapping |
| scores | torch.Tensor | Yes | Raw logit scores [1, vocab_size] |
Outputs
| Name | Type | Description |
|---|---|---|
| constrained_scores | torch.Tensor | Scores with invalid tokens set to -inf |
Usage Examples
Internal Usage During Generation
from lorax_server.utils.logits_process import OutlinesLogitsProcessor
# Created during request setup
processor = OutlinesLogitsProcessor(
schema='{"type":"object","properties":{"name":{"type":"string"}}}',
tokenizer=model_tokenizer,
)
# Called at each generation step
constrained_scores = processor(raw_logits) # Invalid tokens masked
next_token = torch.argmax(constrained_scores)
processor.next_state(next_token.item()) # Advance FSM
Batched Heterogeneous Schemas
from lorax_server.utils.logits_process import HeterogeneousSchemaLogitsProcessor
# Different schemas per request in batch
processor = HeterogeneousSchemaLogitsProcessor.from_schemas(
schemas=[
'{"type":"object","properties":{"name":{"type":"string"}}}',
None, # No constraint for second request
'{"type":"object","properties":{"count":{"type":"integer"}}}',
],
tokenizers=[tokenizer, None, tokenizer],
)
# Applied to batched scores [batch_size, vocab_size]
constrained_scores = processor(input_ids, batch_scores)