Implementation:Predibase Lorax Outlines Logits Processor

Knowledge Sources	LoRAX Outlines
Domains	Structured_Output, Text_Generation
Last Updated	2026-02-08 02:00 GMT

Overview

Concrete tool for FSM-based JSON schema enforcement during token generation provided by the OutlinesLogitsProcessor and HeterogeneousSchemaLogitsProcessor classes.

Description

The OutlinesLogitsProcessor compiles a JSON schema into a finite state machine (FSM) using the Outlines library's build_regex_from_schema and RegexGuide.from_regex functions. At each generation step, its __call__ method queries the FSM for allowed tokens and masks all others to -inf. The FSM state advances with each generated token via next_state().

The HeterogeneousSchemaLogitsProcessor wraps multiple OutlinesLogitsProcessor instances to handle batched requests where different items may have different schemas (or no schema).

Both FSM compilation and tokenizer adaptation are cached via @lru_cache for performance.

Usage

Used internally during token generation when a schema constraint is active. Not called directly by users. The ToolGrammar::apply() function in Rust handles function-calling tool grammars.

Code Reference

Source Location

Repository: LoRAX
File: server/lorax_server/utils/logits_process.py
Lines: 532-601 (OutlinesLogitsProcessor), 464-528 (HeterogeneousSchemaLogitsProcessor)

Signature

class OutlinesLogitsProcessor(LogitsProcessor):
    def __init__(self, schema: str, tokenizer: PreTrainedTokenizerBase):
        """
        Compile the FSM from JSON schema.

        Args:
            schema: JSON schema string
            tokenizer: HF tokenizer for vocabulary mapping
        """
        self.tokenizer = OutlinesLogitsProcessor.adapt_tokenizer(tokenizer)
        self.fsm = OutlinesLogitsProcessor.compile_fsm(schema, self.tokenizer)
        self.fsm_state = 0

    def __call__(self, scores: torch.Tensor) -> torch.Tensor:
        """Apply FSM constraint to logit scores."""
        allowed_tokens = self.fsm.get_next_instruction(self.fsm_state).tokens
        mask = torch.full_like(scores, -math.inf)
        mask[:, allowed_tokens] = 0
        return scores + mask

    def next_state(self, next_token_id: int):
        """Advance FSM state after token selection."""
        self.fsm_state = self.fsm.get_next_state(self.fsm_state, next_token_id)

    @staticmethod
    @lru_cache(maxsize=32, typed=True)
    def compile_fsm(schema, tokenizer):
        regex_string = build_regex_from_schema(schema)
        return RegexGuide.from_regex(regex_string, tokenizer)

class HeterogeneousSchemaLogitsProcessor(LogitsProcessor):
    def __init__(self, sequence_processors: List[Optional[OutlinesLogitsProcessor]]):
        """Handle batched requests with different schemas."""

    @classmethod
    def from_schemas(
        cls,
        schemas: List[Optional[str]],
        tokenizers: List[Optional[PreTrainedTokenizerBase]],
    ) -> "HeterogeneousSchemaLogitsProcessor":
        """Create from lists of schemas and tokenizers."""

Import

from lorax_server.utils.logits_process import (
    OutlinesLogitsProcessor,
    HeterogeneousSchemaLogitsProcessor,
)

I/O Contract

Inputs

Name	Type	Required	Description
schema	str	Yes	JSON schema string for FSM compilation
tokenizer	PreTrainedTokenizerBase	Yes	HF tokenizer for vocabulary mapping
scores	torch.Tensor	Yes	Raw logit scores [1, vocab_size]

Outputs

Name	Type	Description
constrained_scores	torch.Tensor	Scores with invalid tokens set to -inf

Usage Examples

Internal Usage During Generation

from lorax_server.utils.logits_process import OutlinesLogitsProcessor

# Created during request setup
processor = OutlinesLogitsProcessor(
    schema='{"type":"object","properties":{"name":{"type":"string"}}}',
    tokenizer=model_tokenizer,
)

# Called at each generation step
constrained_scores = processor(raw_logits)  # Invalid tokens masked
next_token = torch.argmax(constrained_scores)
processor.next_state(next_token.item())     # Advance FSM

Batched Heterogeneous Schemas

from lorax_server.utils.logits_process import HeterogeneousSchemaLogitsProcessor

# Different schemas per request in batch
processor = HeterogeneousSchemaLogitsProcessor.from_schemas(
    schemas=[
        '{"type":"object","properties":{"name":{"type":"string"}}}',
        None,  # No constraint for second request
        '{"type":"object","properties":{"count":{"type":"integer"}}}',
    ],
    tokenizers=[tokenizer, None, tokenizer],
)

# Applied to batched scores [batch_size, vocab_size]
constrained_scores = processor(input_ids, batch_scores)

Related Pages

Implements Principle

Principle:Predibase_Lorax_Constrained_Decoding

Requires Environment

Uses Heuristic

Heuristic:Predibase_Lorax_GPU_Sampling_Optimization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment