Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Llama cpp LLGuidance

From Leeroopedia
Knowledge Sources
Domains Grammar, Constrained_Generation
Last Updated 2026-02-15 00:00 GMT

Overview

Integrates the llguidance library as a llama_sampler for grammar-constrained generation, supporting multiple grammar formats including GBNF, regex, and JSON schema.

Description

When `LLAMA_USE_LLGUIDANCE` is defined, this module implements a `llama_sampler` that wraps the `LlgMatcher` from the llguidance Rust library. During sampling (`llama_sampler_llg_apply`), it retrieves a bitmask from the matcher indicating which tokens are allowed by the grammar, and sets disallowed tokens' logits to negative infinity. Token acceptance (`llama_sampler_llg_accept_impl`) feeds accepted tokens back to the matcher to advance its state. It initializes a `LlgTokenizer` from the llama vocabulary with proper token-to-string mappings and special token handling. When llguidance is not available, a stub implementation logs an error. The `LLGUIDANCE_LOG_LEVEL` environment variable controls debug output.

Usage

Use this module as an alternative grammar engine to the built-in GBNF parser. It is activated by compiling with `LLAMA_USE_LLGUIDANCE` and provides potentially better performance for complex grammars through the Rust-based llguidance library, while supporting additional grammar formats beyond GBNF.

Code Reference

Source Location

Signature

struct llama_sampler_llg {
    const llama_vocab * vocab;
    std::string         grammar_kind;
    std::string         grammar_data;
    LlgTokenizer *      tokenizer;
    LlgMatcher *        grammar;
};

// Sampler interface functions
static const char * llama_sampler_llg_name(const llama_sampler * smpl);
static void llama_sampler_llg_accept_impl(llama_sampler * smpl, llama_token token);
static void llama_sampler_llg_apply(llama_sampler * smpl, llama_token_data_array * cur_p);
static void llama_sampler_llg_reset(llama_sampler * smpl);
static llama_sampler * llama_sampler_llg_clone(const llama_sampler * smpl);
static void llama_sampler_llg_free(llama_sampler * smpl);

// Public initialization function
struct llama_sampler * llama_sampler_init_llg(const llama_vocab * vocab,
                                              const char * grammar_kind,
                                              const char * grammar_data);

Import

#include "sampling.h"

I/O Contract

Inputs

Name Type Required Description
vocab const llama_vocab * Yes Pointer to the llama vocabulary for token-to-string mapping
grammar_kind const char * Yes Grammar format identifier (e.g., "gbnf", "regex", "json_schema")
grammar_data const char * Yes Grammar definition string in the specified format
token llama_token Yes Token ID to accept and advance grammar state
cur_p llama_token_data_array * Yes Array of token candidates with logits to mask

Outputs

Name Type Description
llama_sampler_init_llg return struct llama_sampler * Initialized sampler that constrains generation to the given grammar
apply effect modified logits Disallowed tokens have their logits set to -INFINITY in cur_p

Usage Examples

#include "sampling.h"

// Initialize the llguidance sampler with a JSON schema grammar
auto * sampler = llama_sampler_init_llg(
    model_vocab,
    "json_schema",
    "{\"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}}}"
);

// The sampler is added to the sampling chain
// During generation, it automatically masks disallowed tokens
// and advances grammar state as tokens are accepted

// Control debug output via environment variable
// export LLGUIDANCE_LOG_LEVEL=2

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment