Implementation:Ggml org Llama cpp Pydantic Example
| Knowledge Sources | |
|---|---|
| Domains | Structured_Output, Example |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Demonstrates how to use Pydantic models to get structured JSON output from a llama.cpp server via its OpenAI-compatible chat completions endpoint.
Description
Defines a `create_completion` function that takes a Pydantic `response_model`, extracts its JSON schema via `TypeAdapter`, sends it as a `response_format` constraint alongside the chat messages to the server, and validates the returned JSON against the model. The main block defines nested Pydantic models (`QAPair`, `PyramidalSummary`) and requests a structured pyramidal document summary. An alternative branch using the Instructor library is also included.
Usage
Use this example to learn how to combine Pydantic type definitions with llama.cpp's JSON schema constrained generation to produce validated, structured output from LLM inference.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: examples/json_schema_pydantic_example.py
- Lines: 1-82
Signature
def create_completion(*, response_model=None, endpoint="http://localhost:8080/v1/chat/completions", messages, **kwargs)
class QAPair(BaseModel)
class PyramidalSummary(BaseModel)
Import
from pydantic import BaseModel, Field, TypeAdapter
from annotated_types import MinLen
from typing import Annotated, List, Optional
import json, requests
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| response_model | type | No | Pydantic model class to use as the JSON schema constraint |
| endpoint | str | No | OpenAI-compatible chat completions URL (default: http://localhost:8080/v1/chat/completions) |
| messages | list[dict] | Yes | Chat messages to send to the server |
| **kwargs | dict | No | Additional parameters passed to the API request |
Outputs
| Name | Type | Description |
|---|---|---|
| return | BaseModel or str | Validated Pydantic model instance if response_model is provided, otherwise raw content string |
Usage Examples
# Start the server first
./llama-server -m some-model.gguf &
# Install dependencies and run
pip install pydantic
python json_schema_pydantic_example.py