Implementation:Ggml org Llama cpp Pydantic Grammar Examples

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Grammar, Structured_Output
Last Updated	2026-02-15 00:00 GMT

Overview

Demonstrates function calling and structured output using Pydantic models with GBNF grammar-constrained generation via a llama.cpp server.

Description

Defines several example scenarios: simple message sending (`SendMessageToUser`), a calculator tool with math operations (`Calculator`), structured book database entry creation (`Book`), and concurrent multi-tool calling (datetime, weather, calculator). Each example generates a GBNF grammar from Pydantic models, constructs a system prompt with documentation, sends it to a llama-server `/completion` endpoint, and parses the grammar-constrained JSON response back into typed objects.

Usage

Use this script as a reference for implementing end-to-end function calling and structured output with grammar-constrained generation. Run it against a running llama-server instance to see practical examples of Pydantic-to-grammar workflows.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: examples/pydantic_models_to_grammar_examples.py
Lines: 1-312

Signature

class SendMessageToUser(BaseModel):
    chain_of_thought: str
    message: str

class MathOperation(Enum):
    ADD = "add"
    SUBTRACT = "subtract"
    MULTIPLY = "multiply"
    DIVIDE = "divide"

class Calculator(BaseModel):
    number_one: Union[int, float]
    operation: MathOperation
    number_two: Union[int, float]

class Category(Enum):
    Fiction = "Fiction"
    NonFiction = "Non-Fiction"

class Book(BaseModel):
    title: str
    author: str
    published_year: int
    keywords: list[str]
    category: Category
    summary: str

def create_completion(host: str, prompt: str, gbnf_grammar: str) -> str: ...
def example_rce(host: str) -> int: ...
def example_calculator(host: str) -> int: ...
def example_struct(host: str) -> int: ...
def example_concurrent(host: str) -> int: ...
def main() -> int: ...

Import

from __future__ import annotations
import argparse
import datetime
import json
import logging
import textwrap
import sys
from enum import Enum
from typing import Optional, Union
import requests
from pydantic import BaseModel, Field
from pydantic_models_to_grammar import (
    add_run_method_to_dynamic_model,
    convert_dictionary_to_pydantic_model,
    create_dynamic_model_from_function,
    generate_gbnf_grammar_and_documentation
)

I/O Contract

Inputs

Name	Type	Required	Description
host	str	Yes	Host address of the running llama-server (e.g., "localhost:8080")
prompt	str	Yes	Full prompt string including system message and user query
gbnf_grammar	str	Yes	GBNF grammar string generated from Pydantic models

Outputs

Name	Type	Description
content	str	JSON string response from the model, constrained to match the grammar
parsed_result	BaseModel	Pydantic model instance parsed from the JSON response
exit_code	int	0 on success, non-zero on failure

Usage Examples

# Run all examples against a local llama-server
# First start the server: ./llama-server -m model.gguf

# Then run the examples:
python pydantic_models_to_grammar_examples.py --host localhost:8080

# Or run specific examples:
# example_rce: simple message sending with chain-of-thought
# example_calculator: math operations via function calling
# example_struct: structured book entry creation
# example_concurrent: multi-tool calling (weather + datetime + calculator)

Related Pages

Principle:Ggml_org_Llama_cpp_Grammar_Constrained_Generation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment