Implementation:Ggml org Llama cpp Pydantic Grammar Examples
| Knowledge Sources | |
|---|---|
| Domains | Grammar, Structured_Output |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Demonstrates function calling and structured output using Pydantic models with GBNF grammar-constrained generation via a llama.cpp server.
Description
Defines several example scenarios: simple message sending (`SendMessageToUser`), a calculator tool with math operations (`Calculator`), structured book database entry creation (`Book`), and concurrent multi-tool calling (datetime, weather, calculator). Each example generates a GBNF grammar from Pydantic models, constructs a system prompt with documentation, sends it to a llama-server `/completion` endpoint, and parses the grammar-constrained JSON response back into typed objects.
Usage
Use this script as a reference for implementing end-to-end function calling and structured output with grammar-constrained generation. Run it against a running llama-server instance to see practical examples of Pydantic-to-grammar workflows.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: examples/pydantic_models_to_grammar_examples.py
- Lines: 1-312
Signature
class SendMessageToUser(BaseModel):
chain_of_thought: str
message: str
class MathOperation(Enum):
ADD = "add"
SUBTRACT = "subtract"
MULTIPLY = "multiply"
DIVIDE = "divide"
class Calculator(BaseModel):
number_one: Union[int, float]
operation: MathOperation
number_two: Union[int, float]
class Category(Enum):
Fiction = "Fiction"
NonFiction = "Non-Fiction"
class Book(BaseModel):
title: str
author: str
published_year: int
keywords: list[str]
category: Category
summary: str
def create_completion(host: str, prompt: str, gbnf_grammar: str) -> str: ...
def example_rce(host: str) -> int: ...
def example_calculator(host: str) -> int: ...
def example_struct(host: str) -> int: ...
def example_concurrent(host: str) -> int: ...
def main() -> int: ...
Import
from __future__ import annotations
import argparse
import datetime
import json
import logging
import textwrap
import sys
from enum import Enum
from typing import Optional, Union
import requests
from pydantic import BaseModel, Field
from pydantic_models_to_grammar import (
add_run_method_to_dynamic_model,
convert_dictionary_to_pydantic_model,
create_dynamic_model_from_function,
generate_gbnf_grammar_and_documentation
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| host | str | Yes | Host address of the running llama-server (e.g., "localhost:8080") |
| prompt | str | Yes | Full prompt string including system message and user query |
| gbnf_grammar | str | Yes | GBNF grammar string generated from Pydantic models |
Outputs
| Name | Type | Description |
|---|---|---|
| content | str | JSON string response from the model, constrained to match the grammar |
| parsed_result | BaseModel | Pydantic model instance parsed from the JSON response |
| exit_code | int | 0 on success, non-zero on failure |
Usage Examples
# Run all examples against a local llama-server
# First start the server: ./llama-server -m model.gguf
# Then run the examples:
python pydantic_models_to_grammar_examples.py --host localhost:8080
# Or run specific examples:
# example_rce: simple message sending with chain-of-thought
# example_calculator: math operations via function calling
# example_struct: structured book entry creation
# example_concurrent: multi-tool calling (weather + datetime + calculator)