Principle:Ggml org Llama cpp Grammar Constrained Decoding
| Knowledge Sources | |
|---|---|
| Domains | Grammar, Constrained_Generation |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Grammar Constrained Decoding is the principle of restricting token sampling to only produce output that conforms to a formal grammar specification.
Description
This principle covers the mechanism by which a GBNF (GGML BNF) grammar is used to constrain the token sampling process during text generation. At each decoding step, the grammar state is advanced and used to compute a mask over the vocabulary, setting the logits of invalid tokens to negative infinity so they cannot be selected. This guarantees that the generated output is syntactically valid according to the specified grammar.
Usage
Apply this principle when model output must conform to a specific syntax, such as valid JSON, SQL, code in a particular language, or any other formally defined format.
Theoretical Basis
Grammar constrained decoding works by maintaining a parser state that tracks the current position within the grammar's production rules. Before token sampling, each candidate token is tested against the grammar by checking whether the token's text representation can extend the current parse state to a valid next state. Tokens that would cause a parse failure are masked out by setting their logits to negative infinity. The grammar is specified in GBNF format, which is a variant of BNF (Backus-Naur Form) extended with character classes and repetition operators. The parser uses a stack-based approach to handle recursive grammar rules efficiently.