Principle:Ggml org Llama cpp Grammar Constrained Decoding

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Grammar, Constrained_Generation
Last Updated	2026-02-15 00:00 GMT

Overview

Grammar Constrained Decoding is the principle of restricting token sampling to only produce output that conforms to a formal grammar specification.

Description

This principle covers the mechanism by which a GBNF (GGML BNF) grammar is used to constrain the token sampling process during text generation. At each decoding step, the grammar state is advanced and used to compute a mask over the vocabulary, setting the logits of invalid tokens to negative infinity so they cannot be selected. This guarantees that the generated output is syntactically valid according to the specified grammar.

Usage

Apply this principle when model output must conform to a specific syntax, such as valid JSON, SQL, code in a particular language, or any other formally defined format.

Theoretical Basis

Grammar constrained decoding works by maintaining a parser state that tracks the current position within the grammar's production rules. Before token sampling, each candidate token is tested against the grammar by checking whether the token's text representation can extend the current parse state to a valid next state. Tokens that would cause a parse failure are masked out by setting their logits to negative infinity. The grammar is specified in GBNF format, which is a variant of BNF (Backus-Naur Form) extended with character classes and repetition operators. The parser uses a stack-based approach to handle recursive grammar rules efficiently.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment