Implementation:Hiyouga LLaMA Factory Chat Template
| Knowledge Sources | |
|---|---|
| Domains | Prompt Engineering, Tokenization |
| Last Updated | 2026-02-06 19:00 GMT |
Overview
Concrete chat template definitions and tokenization encoding system for 100+ model architectures provided by LLaMA Factory.
Description
This module defines the Template dataclass, which is the core abstraction for converting structured chat messages into tokenized sequences. Each template encapsulates role-specific formatters (user, assistant, system, function, observation, tools, prefix), stop words, thought/reasoning delimiters, efficient EOS handling, and a reference to the appropriate multimodal plugin.
The Template class provides:
- encode_oneturn -- Encodes a full conversation into a single (prompt_ids, response_ids) pair for training/inference
- encode_multiturn -- Encodes a conversation into multiple (prompt, response) pairs for multi-turn training
- extract_tool -- Parses tool/function calls from model output using the template's tool formatter
- get_stop_token_ids -- Returns all stop token IDs including EOS and stop words
- add_thought / remove_thought -- Manages reasoning/thinking tokens for chain-of-thought models
- fix_special_tokens -- Patches tokenizer EOS, PAD, and stop word tokens
- fix_jinja_template -- Generates or replaces the tokenizer's Jinja chat template
Two specialized subclasses exist:
- Llama2Template -- Handles LLaMA-2's unique system message embedding within the first user turn
- ReasoningTemplate -- Adds thought word token IDs to the beginning of assistant messages for reasoning models
Over 100 templates are registered via register_template for models including LLaMA, Qwen, ChatGLM, Mistral, Gemma, DeepSeek, Phi, Yi, InternLM, Baichuan, and many others. The get_template_and_fix_tokenizer function selects and initializes the correct template.
Usage
Templates are obtained via get_template_and_fix_tokenizer at the start of any training or inference workflow. The template is then used throughout the data pipeline for encoding examples and during inference for prompt construction.
Code Reference
Source Location
- Repository: Hiyouga_LLaMA_Factory
- File: src/llamafactory/data/template.py
- Lines: 1-2175
Signature
@dataclass
class Template:
format_user: "Formatter"
format_assistant: "Formatter"
format_system: "Formatter"
format_function: "Formatter"
format_observation: "Formatter"
format_tools: "Formatter"
format_prefix: "Formatter"
default_system: str
stop_words: list[str]
thought_words: tuple[str, str]
tool_call_words: tuple[str, str]
efficient_eos: bool
replace_eos: bool
replace_jinja_template: bool
enable_thinking: Optional[bool]
mm_plugin: "BasePlugin"
def encode_oneturn(self, tokenizer, messages, system=None, tools=None) -> tuple[list[int], list[int]]: ...
def encode_multiturn(self, tokenizer, messages, system=None, tools=None) -> list[tuple[list[int], list[int]]]: ...
def extract_tool(self, content: str) -> Union[str, list["FunctionCall"]]: ...
def get_stop_token_ids(self, tokenizer) -> list[int]: ...
def fix_special_tokens(self, tokenizer) -> None: ...
def fix_jinja_template(self, tokenizer) -> None: ...
@dataclass
class Llama2Template(Template): ...
@dataclass
class ReasoningTemplate(Template): ...
def register_template(name: str, ...) -> None: ...
def get_template_and_fix_tokenizer(tokenizer, data_args) -> "Template": ...
Import
from llamafactory.data.template import Template, get_template_and_fix_tokenizer
I/O Contract
Inputs (encode_oneturn)
| Name | Type | Required | Description |
|---|---|---|---|
| tokenizer | PreTrainedTokenizer | Yes | Tokenizer for converting text to token IDs |
| messages | list[dict[str, str]] | Yes | Chat messages with role ("user", "assistant", "observation", "function") and content |
| system | str | No | System prompt override (defaults to template's default_system) |
| tools | str | No | JSON string of tool definitions |
Outputs (encode_oneturn)
| Name | Type | Description |
|---|---|---|
| prompt_ids | list[int] | Token IDs for the prompt (all messages except last) |
| response_ids | list[int] | Token IDs for the response (last message) |
Template Registration
Templates are registered using register_template with the following parameters:
| Parameter | Type | Description |
|---|---|---|
| name | str | Template identifier (e.g., "llama3", "qwen", "chatglm4") |
| format_user | Formatter | Formatter for user messages (typically StringFormatter with model-specific tokens) |
| format_assistant | Formatter | Formatter for assistant messages |
| format_system | Formatter | Formatter for system messages |
| format_function | Formatter | Formatter for function call messages |
| format_observation | Formatter | Formatter for tool observation messages |
| format_tools | Formatter | Formatter for tool definitions (uses ToolFormatter) |
| default_system | str | Default system prompt |
| stop_words | list[str] | Additional stop tokens beyond EOS |
| mm_plugin | str | Name of the multimodal plugin to use |
Usage Examples
from llamafactory.data import get_template_and_fix_tokenizer
# Get template and fix tokenizer special tokens
template = get_template_and_fix_tokenizer(tokenizer, data_args)
# Encode a single conversation turn
prompt_ids, response_ids = template.encode_oneturn(
tokenizer,
messages=[
{"role": "user", "content": "What is 2+2?"},
{"role": "assistant", "content": "4"},
],
system="You are a helpful assistant.",
)
# Encode multiple turns for multi-turn training
pairs = template.encode_multiturn(
tokenizer,
messages=[
{"role": "user", "content": "Hi"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "How are you?"},
{"role": "assistant", "content": "I'm great!"},
],
)
# Extract tool calls from model output
result = template.extract_tool("Action: search\nAction Input: {\"query\": \"weather\"}")
# Returns: [FunctionCall(name="search", arguments='{"query": "weather"}')]
Related Pages
- Hiyouga_LLaMA_Factory_Multimodal_Plugin - MM plugin referenced by each template
- Hiyouga_LLaMA_Factory_Tool_Utils - Tool formatting utilities used by ToolFormatter
- Hiyouga_LLaMA_Factory_Data_Converter - Converters that produce the standardized messages consumed by templates
- Hiyouga_LLaMA_Factory_HfChatEngine - Inference engine that uses templates for prompt encoding