Implementation:Ggml org Llama cpp LLGuidance
| Knowledge Sources | |
|---|---|
| Domains | Grammar, Constrained_Generation |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Integrates the llguidance library as a llama_sampler for grammar-constrained generation, supporting multiple grammar formats including GBNF, regex, and JSON schema.
Description
When `LLAMA_USE_LLGUIDANCE` is defined, this module implements a `llama_sampler` that wraps the `LlgMatcher` from the llguidance Rust library. During sampling (`llama_sampler_llg_apply`), it retrieves a bitmask from the matcher indicating which tokens are allowed by the grammar, and sets disallowed tokens' logits to negative infinity. Token acceptance (`llama_sampler_llg_accept_impl`) feeds accepted tokens back to the matcher to advance its state. It initializes a `LlgTokenizer` from the llama vocabulary with proper token-to-string mappings and special token handling. When llguidance is not available, a stub implementation logs an error. The `LLGUIDANCE_LOG_LEVEL` environment variable controls debug output.
Usage
Use this module as an alternative grammar engine to the built-in GBNF parser. It is activated by compiling with `LLAMA_USE_LLGUIDANCE` and provides potentially better performance for complex grammars through the Rust-based llguidance library, while supporting additional grammar formats beyond GBNF.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: common/llguidance.cpp
- Lines: 1-258
Signature
struct llama_sampler_llg {
const llama_vocab * vocab;
std::string grammar_kind;
std::string grammar_data;
LlgTokenizer * tokenizer;
LlgMatcher * grammar;
};
// Sampler interface functions
static const char * llama_sampler_llg_name(const llama_sampler * smpl);
static void llama_sampler_llg_accept_impl(llama_sampler * smpl, llama_token token);
static void llama_sampler_llg_apply(llama_sampler * smpl, llama_token_data_array * cur_p);
static void llama_sampler_llg_reset(llama_sampler * smpl);
static llama_sampler * llama_sampler_llg_clone(const llama_sampler * smpl);
static void llama_sampler_llg_free(llama_sampler * smpl);
// Public initialization function
struct llama_sampler * llama_sampler_init_llg(const llama_vocab * vocab,
const char * grammar_kind,
const char * grammar_data);
Import
#include "sampling.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| vocab | const llama_vocab * | Yes | Pointer to the llama vocabulary for token-to-string mapping |
| grammar_kind | const char * | Yes | Grammar format identifier (e.g., "gbnf", "regex", "json_schema") |
| grammar_data | const char * | Yes | Grammar definition string in the specified format |
| token | llama_token | Yes | Token ID to accept and advance grammar state |
| cur_p | llama_token_data_array * | Yes | Array of token candidates with logits to mask |
Outputs
| Name | Type | Description |
|---|---|---|
| llama_sampler_init_llg return | struct llama_sampler * | Initialized sampler that constrains generation to the given grammar |
| apply effect | modified logits | Disallowed tokens have their logits set to -INFINITY in cur_p |
Usage Examples
#include "sampling.h"
// Initialize the llguidance sampler with a JSON schema grammar
auto * sampler = llama_sampler_init_llg(
model_vocab,
"json_schema",
"{\"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}}}"
);
// The sampler is added to the sampling chain
// During generation, it automatically masks disallowed tokens
// and advances grammar state as tokens are accepted
// Control debug output via environment variable
// export LLGUIDANCE_LOG_LEVEL=2