Principle:Ollama Ollama Chat Template System
| Knowledge Sources | |
|---|---|
| Domains | Chat Template, Prompt Engineering |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
The Chat Template System is the principle of using a template engine to render structured conversation messages (with roles, content, and metadata) into the raw text prompt format expected by a language model. This decouples the conversational API from model-specific formatting requirements, enabling a single inference pipeline to serve models with diverse prompt conventions.
Core Concepts
Template-Based Prompt Construction
Rather than hard-coding prompt formats, a template-based system defines the formatting logic as data (template strings) that can be loaded, swapped, and customized per model. The template receives a structured input (typically a list of messages with roles such as system, user, assistant, and tool) and produces a single string ready for tokenization. This approach follows the separation of concerns principle: the inference pipeline handles execution while templates handle formatting.
Go text/template Engine
Go's standard library text/template package provides a template engine with features including conditional logic (Template:If), iteration (Template:Range), variable assignment (Template:$var := .Field), function pipelines, and nested template definitions. For chat templates, custom template functions are registered to handle common operations such as JSON serialization of tool schemas, string manipulation, and special token insertion. The Go template engine is safe, sandboxed, and compiled to an efficient internal representation.
Message Role Dispatching
Chat templates must handle different message roles with different formatting. A typical template uses conditional branching to apply role-specific formatting: system messages may be wrapped in special tokens or placed at the beginning of the prompt, user messages receive user-turn markers, assistant messages receive assistant-turn markers, and tool messages are formatted as function call results. The template must also handle edge cases such as consecutive messages of the same role, empty content, and the presence or absence of a system prompt.
Special Token Management
Language models use special tokens (Begin-of-Sequence, End-of-Sequence, role markers, tool call delimiters) that are not part of the natural language vocabulary. Chat templates must insert these tokens at the correct positions. Some tokens are literal strings embedded in the template (e.g., <|im_start|> for ChatML), while others may be model-specific and injected via template variables or custom functions. Correct special token placement is critical for model behavior, as incorrect formatting can cause degraded generation quality or parsing failures.
Tool and Function Call Formatting
Modern chat templates must support tool/function calling conventions where the model can request tool invocations and receive their results. This involves formatting available tool schemas (typically as JSON) in the system or user context, recognizing assistant messages that contain tool call requests, and formatting tool result messages that feed function outputs back to the model. Different model families use different conventions for tool formatting (JSON blocks, XML tags, special tokens).
Implementation Notes
In the Ollama codebase, the chat template system uses Go's text/template engine with custom registered functions to render conversation messages into model-specific prompt strings. Each model's Modelfile or GGUF metadata specifies a template name or inline template string. The template engine receives a data structure containing the message list, tool definitions, and model-specific parameters, and produces the formatted prompt. Custom template functions include JSON serialization, tool schema formatting, and conditional helpers. The system supports all major prompt formats through a library of pre-defined templates.