Implementation:Ggml org Llama cpp Peg Parser Header
| Knowledge Sources | |
|---|---|
| Domains | Parsing, Grammar |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Declares the PEG parser framework including parser types, AST nodes, builder infrastructure, and parse result types for constrained text generation.
Description
This header defines common_peg_parser as a lightweight wrapper around parser IDs with operator overloading for building sequences (+), choices (|), and space-separated sequences (<<). It provides an extensive set of parser types (epsilon, literal, sequence, choice, repetition, character classes, JSON string, until-delimiter, schema, rule references, etc.) stored in a common_peg_arena. The common_peg_ast_arena manages AST nodes, and parse results use a three-state enum (fail/success/need_more_input) for streaming support.
Usage
Include this header when working with the PEG-based constrained generation engine. It provides the type system and API for composing grammar definitions programmatically and using them for real-time parsing during token generation.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: common/peg-parser.h
- Lines: 1-459
Signature
using common_peg_parser_id = size_t;
using common_peg_ast_id = size_t;
class common_peg_parser {
common_peg_parser_id id_;
common_peg_parser_builder & builder_;
public:
common_peg_parser operator+(const common_peg_parser & other) const; // sequence
common_peg_parser operator<<(const common_peg_parser & other) const; // space-separated sequence
common_peg_parser operator|(const common_peg_parser & other) const; // choice
};
enum common_peg_parse_result_type {
COMMON_PEG_PARSE_RESULT_FAIL = 0,
COMMON_PEG_PARSE_RESULT_SUCCESS = 1,
COMMON_PEG_PARSE_RESULT_NEED_MORE_INPUT = 2,
};
struct common_peg_ast_node {
common_peg_ast_id id;
std::string rule;
std::string tag;
size_t start, end;
std::string_view text;
std::vector<common_peg_ast_id> children;
bool is_partial = false;
};
class common_peg_ast_arena {
common_peg_ast_id add_node(const std::string & rule, const std::string & tag,
size_t start, size_t end, std::string_view text,
std::vector<common_peg_ast_id> children, bool is_partial = false);
};
Import
#pragma once
#include <nlohmann/json_fwd.hpp>
#include <memory>
#include <unordered_map>
#include <string>
#include <string_view>
#include <functional>
#include <vector>
#include <variant>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| id | common_peg_parser_id | Yes | Parser ID used to reference parsers in the arena |
| builder | common_peg_parser_builder & | Yes | Reference to the builder that owns the parser arena |
| other | common_peg_parser | Yes | Another parser to compose via sequence, choice, or space-separated operators |
| rule | std::string | Yes | Grammar rule name for AST node creation |
| tag | std::string | No | Optional tag label for AST node identification |
Outputs
| Name | Type | Description |
|---|---|---|
| common_peg_parser | common_peg_parser | Composed parser from operator overloading (sequence, choice, etc.) |
| common_peg_ast_id | size_t | ID of a newly created AST node in the arena |
| result_type | common_peg_parse_result_type | Three-state parse result for streaming support |
Usage Examples
#include "peg-parser.h"
// Build parsers using operator overloading
common_peg_parser_builder builder;
auto digit = builder.chars('0', '9'); // character range [0-9]
auto letter = builder.chars('a', 'z'); // character range [a-z]
auto identifier = letter + (letter | digit).rep(); // letter followed by letter|digit*
// Space-separated sequence
auto assignment = identifier << "=" << identifier;
// Choice between alternatives
auto keyword = builder.literal("if") | builder.literal("else") | builder.literal("while");