Principle:Hiyouga LLaMA Factory Chat Template System
| Knowledge Sources | |
|---|---|
| Domains | Natural Language Processing, Data Engineering, Conversational AI |
| Last Updated | 2026-02-06 19:00 GMT |
Overview
A structured encoding framework that converts multi-turn conversations with distinct roles (system, user, assistant, tool) into tokenized sequences with proper special tokens, role markers, and label masks for language model training and inference.
Description
Chat template systems solve the fundamental problem of encoding structured conversational data into the flat token sequences that language models process. Different model families (LLaMA, ChatGLM, Qwen, Mistral, etc.) use different special tokens, role prefixes, and turn delimiters, making template handling a critical interoperability layer.
The chat template system in LLaMA-Factory addresses several challenges:
- Model-specific formatting: Each model architecture expects conversations to be formatted with specific special tokens and delimiters. For example, LLaMA-3 uses
<|start_header_id|>user<|end_header_id|>while ChatGLM uses[gMASK]<sop>. - Role-based encoding: Messages from different roles (system, user, assistant, function, observation) must be encoded with appropriate prefixes and suffixes.
- Label masking: During training, only the assistant's response tokens should contribute to the loss. The template system generates the label mask that assigns IGNORE_INDEX to all non-response positions.
- Tool/function calling: Templates must support encoding tool definitions, function call requests, and function result observations in model-specific formats.
- Efficient EOS handling: Some templates use an "efficient EOS" strategy where the end-of-sequence token is shared between the response suffix and the next turn's prefix, avoiding redundant tokens.
The template architecture is composed of:
- Formatters: Abstract components that convert structured data (role, content, tool calls) into slot sequences. Types include StringFormatter (template string interpolation), FunctionFormatter (function call encoding), ToolFormatter (tool definition encoding), and EmptyFormatter (static strings).
- Slots: The atomic units of template composition. A slot is either a literal string or a dictionary mapping a string to a special token identifier, enabling mixing of regular text and special tokens.
- Templates: Top-level objects that compose formatters for each role and orchestrate the encoding of entire conversations.
Usage
Use the chat template system when you want to:
- Encode conversations for training on any supported model architecture.
- Ensure correct special token placement for model-specific chat formats.
- Generate proper label masks for supervised fine-tuning.
- Support tool calling and function execution in conversational workflows.
- Add new model templates by defining role-specific format strings and special tokens.
The template system is used implicitly by all training stages (PT, SFT, DPO, KTO, PPO, RM) whenever structured data is processed.
Theoretical Basis
Conversation Encoding
A conversation with roles and content is encoded as a token sequence:
where denotes concatenation, includes any beginning-of-sequence tokens, and is the role-specific formatter that produces the appropriate token sequence.
Label Mask Construction
For supervised training, the label mask is constructed to train only on assistant responses:
Failed to parse (unknown function "\begin{cases}"): {\displaystyle L_t = \begin{cases} \text{token}_t & \text{if position } t \text{ is within an assistant response} \\ \text{IGNORE\_INDEX} & \text{otherwise} \end{cases} }
The template's encode_oneturn and encode_multiturn methods return separated prompt and response token sequences. The data processor then constructs the full input by concatenating these sequences and applies IGNORE_INDEX to all prompt positions.
Slot Resolution
Formatters produce slot sequences that mix literal text and special tokens. A slot is resolved to token IDs through:
If slot is a string:
token_ids = tokenizer.encode(slot, add_special_tokens=False)
If slot is a dict {text: token_name}:
token_ids = [tokenizer.convert_tokens_to_ids(token_name)]
This two-path resolution ensures that special tokens (like <|im_start|>) are encoded as single tokens rather than being tokenized as subword sequences.
Tool Encoding
Tool definitions are encoded into the system or user message using model-specific formats. The general structure is:
{
"name": "function_name",
"description": "What the function does",
"parameters": {
"type": "object",
"properties": {
"param1": {"type": "string", "description": "..."}
},
"required": ["param1"]
}
}
Different tool formatters (default, GLM4, Llama-style) convert this JSON schema into the format expected by each model, handling differences in XML tags, markdown formatting, and parameter description styles.
Template Composition
Each template is defined by seven formatters:
| Formatter | Role | Description |
|---|---|---|
format_user |
User | Encodes user messages with role prefix/suffix |
format_assistant |
Assistant | Encodes assistant responses with role prefix/suffix |
format_system |
System | Encodes system prompts |
format_function |
Function | Encodes function call results |
format_observation |
Observation | Encodes tool execution observations |
format_tools |
Tools | Encodes tool definitions into system context |
format_prefix |
Prefix | Beginning-of-sequence tokens and initial formatting |