Implementation:Ggml org Llama cpp Grammar Header
| Knowledge Sources | |
|---|---|
| Domains | Grammar, Constrained_Generation |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Declares the grammar types, parser, and state machine structures for GBNF grammar-constrained generation.
Description
This header defines the `llama_gretype` enum (rule elements: characters, ranges, alternates, references, tokens), `llama_grammar_element` struct, `llama_partial_utf8` for incremental UTF-8 decoding, and `llama_grammar_candidate` for token-grammar matching. The `llama_grammar_parser` class converts GBNF text to rule vectors and provides symbol ID management. The `llama_grammar` struct holds the active grammar state including rules, pushdown stacks, partial UTF-8 state, and lazy trigger configuration (tokens, regex patterns, buffer).
Usage
Include this header when working with grammar-constrained generation. It defines the grammar infrastructure used by the grammar sampler and the server's tool-calling/JSON mode features.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: src/llama-grammar.h
- Lines: 1-194
Signature
enum llama_gretype { /* LLAMA_GRETYPE_END, _ALT, _RULE_REF, _CHAR, ... */ };
typedef struct llama_grammar_element {
enum llama_gretype type;
uint32_t value;
} llama_grammar_element;
struct llama_partial_utf8 { uint32_t value; int n_remain; };
struct llama_grammar_candidate { size_t index; const uint32_t * code_points; /* ... */ };
struct llama_grammar_parser {
const llama_vocab * vocab;
std::map<std::string, uint32_t> symbol_ids;
llama_grammar_rules rules;
bool parse(const char * src);
void print(FILE * file);
};
struct llama_grammar_trigger_pattern {
std::string pattern;
std::regex regex;
size_t find(const std::string & input) const;
};
struct llama_grammar {
const llama_vocab * vocab;
const llama_grammar_rules rules;
llama_grammar_stacks stacks;
llama_partial_utf8 partial_utf8;
bool lazy;
bool awaiting_trigger;
std::vector<llama_token> trigger_tokens;
std::vector<llama_grammar_trigger_pattern> trigger_patterns;
};
// Internal API
struct llama_grammar * llama_grammar_init_impl(/* ... */);
void llama_grammar_free_impl(struct llama_grammar * grammar);
struct llama_grammar * llama_grammar_clone_impl(const struct llama_grammar & grammar);
void llama_grammar_apply_impl(const struct llama_grammar & grammar, llama_token_data_array * cur_p);
void llama_grammar_accept_impl(struct llama_grammar & grammar, llama_token token);
Import
#include "llama-grammar.h"
// Dependencies:
#include "llama.h"
#include <map>
#include <regex>
#include <string>
#include <vector>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| src | const char * | Yes | GBNF grammar source string for parsing |
| grammar_str | const char * | Yes | Grammar string for init_impl |
| grammar_root | const char * | Yes | Root rule name for the grammar |
| lazy | bool | No | Whether to use lazy grammar triggering |
| trigger_patterns | const char ** | No | Regex patterns that trigger lazy grammar activation |
| trigger_tokens | const llama_token * | No | Tokens that trigger lazy grammar activation |
| cur_p | llama_token_data_array * | Yes | Token candidates to filter via grammar_apply_impl |
| token | llama_token | Yes | Token to accept into the grammar state |
Outputs
| Name | Type | Description |
|---|---|---|
| llama_grammar * | pointer | Initialized grammar state machine |
| parse return | bool | Whether GBNF parsing succeeded |
| cur_p (modified) | llama_token_data_array * | Token candidates with grammar-invalid tokens zeroed out |
Usage Examples
#include "llama-grammar.h"
// Parse a GBNF grammar
llama_grammar_parser parser;
parser.parse("root ::= \"hello\" | \"world\"");
// Initialize grammar state
auto * grammar = llama_grammar_init_impl(vocab, grammar_str, "root",
false, nullptr, 0, nullptr, 0);
// Apply grammar constraints during sampling
llama_grammar_apply_impl(*grammar, &candidates);
// Accept a token
llama_grammar_accept_impl(*grammar, selected_token);
llama_grammar_free_impl(grammar);