Implementation:Ggml org Llama cpp LLGuidance

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Grammar, Constrained_Generation
Last Updated	2026-02-15 00:00 GMT

Overview

Integrates the llguidance library as a llama_sampler for grammar-constrained generation, supporting multiple grammar formats including GBNF, regex, and JSON schema.

Description

When `LLAMA_USE_LLGUIDANCE` is defined, this module implements a `llama_sampler` that wraps the `LlgMatcher` from the llguidance Rust library. During sampling (`llama_sampler_llg_apply`), it retrieves a bitmask from the matcher indicating which tokens are allowed by the grammar, and sets disallowed tokens' logits to negative infinity. Token acceptance (`llama_sampler_llg_accept_impl`) feeds accepted tokens back to the matcher to advance its state. It initializes a `LlgTokenizer` from the llama vocabulary with proper token-to-string mappings and special token handling. When llguidance is not available, a stub implementation logs an error. The `LLGUIDANCE_LOG_LEVEL` environment variable controls debug output.

Usage

Use this module as an alternative grammar engine to the built-in GBNF parser. It is activated by compiling with `LLAMA_USE_LLGUIDANCE` and provides potentially better performance for complex grammars through the Rust-based llguidance library, while supporting additional grammar formats beyond GBNF.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: common/llguidance.cpp
Lines: 1-258

Signature

struct llama_sampler_llg {
    const llama_vocab * vocab;
    std::string         grammar_kind;
    std::string         grammar_data;
    LlgTokenizer *      tokenizer;
    LlgMatcher *        grammar;
};

// Sampler interface functions
static const char * llama_sampler_llg_name(const llama_sampler * smpl);
static void llama_sampler_llg_accept_impl(llama_sampler * smpl, llama_token token);
static void llama_sampler_llg_apply(llama_sampler * smpl, llama_token_data_array * cur_p);
static void llama_sampler_llg_reset(llama_sampler * smpl);
static llama_sampler * llama_sampler_llg_clone(const llama_sampler * smpl);
static void llama_sampler_llg_free(llama_sampler * smpl);

// Public initialization function
struct llama_sampler * llama_sampler_init_llg(const llama_vocab * vocab,
                                              const char * grammar_kind,
                                              const char * grammar_data);

Import

#include "sampling.h"

I/O Contract

Inputs

Name	Type	Required	Description
vocab	const llama_vocab *	Yes	Pointer to the llama vocabulary for token-to-string mapping
grammar_kind	const char *	Yes	Grammar format identifier (e.g., "gbnf", "regex", "json_schema")
grammar_data	const char *	Yes	Grammar definition string in the specified format
token	llama_token	Yes	Token ID to accept and advance grammar state
cur_p	llama_token_data_array *	Yes	Array of token candidates with logits to mask

Outputs

Name	Type	Description
llama_sampler_init_llg return	struct llama_sampler *	Initialized sampler that constrains generation to the given grammar
apply effect	modified logits	Disallowed tokens have their logits set to -INFINITY in cur_p

Usage Examples

#include "sampling.h"

// Initialize the llguidance sampler with a JSON schema grammar
auto * sampler = llama_sampler_init_llg(
    model_vocab,
    "json_schema",
    "{\"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}}}"
);

// The sampler is added to the sampling chain
// During generation, it automatically masks disallowed tokens
// and advances grammar state as tokens are accepted

// Control debug output via environment variable
// export LLGUIDANCE_LOG_LEVEL=2

Related Pages

Principle:Ggml_org_Llama_cpp_Grammar_Constrained_Generation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment