Principle:Hiyouga LLaMA Factory Chat Template System

Knowledge Sources	Hiyouga_LLaMA_Factory
Domains	Natural Language Processing, Data Engineering, Conversational AI
Last Updated	2026-02-06 19:00 GMT

Overview

A structured encoding framework that converts multi-turn conversations with distinct roles (system, user, assistant, tool) into tokenized sequences with proper special tokens, role markers, and label masks for language model training and inference.

Description

Chat template systems solve the fundamental problem of encoding structured conversational data into the flat token sequences that language models process. Different model families (LLaMA, ChatGLM, Qwen, Mistral, etc.) use different special tokens, role prefixes, and turn delimiters, making template handling a critical interoperability layer.

The chat template system in LLaMA-Factory addresses several challenges:

Model-specific formatting: Each model architecture expects conversations to be formatted with specific special tokens and delimiters. For example, LLaMA-3 uses <|start_header_id|>user<|end_header_id|> while ChatGLM uses [gMASK]<sop>.
Role-based encoding: Messages from different roles (system, user, assistant, function, observation) must be encoded with appropriate prefixes and suffixes.
Label masking: During training, only the assistant's response tokens should contribute to the loss. The template system generates the label mask that assigns IGNORE_INDEX to all non-response positions.
Tool/function calling: Templates must support encoding tool definitions, function call requests, and function result observations in model-specific formats.
Efficient EOS handling: Some templates use an "efficient EOS" strategy where the end-of-sequence token is shared between the response suffix and the next turn's prefix, avoiding redundant tokens.

The template architecture is composed of:

Formatters: Abstract components that convert structured data (role, content, tool calls) into slot sequences. Types include StringFormatter (template string interpolation), FunctionFormatter (function call encoding), ToolFormatter (tool definition encoding), and EmptyFormatter (static strings).
Slots: The atomic units of template composition. A slot is either a literal string or a dictionary mapping a string to a special token identifier, enabling mixing of regular text and special tokens.
Templates: Top-level objects that compose formatters for each role and orchestrate the encoding of entire conversations.

Usage

Use the chat template system when you want to:

Encode conversations for training on any supported model architecture.
Ensure correct special token placement for model-specific chat formats.
Generate proper label masks for supervised fine-tuning.
Support tool calling and function execution in conversational workflows.
Add new model templates by defining role-specific format strings and special tokens.

The template system is used implicitly by all training stages (PT, SFT, DPO, KTO, PPO, RM) whenever structured data is processed.

Theoretical Basis

Conversation Encoding

A conversation $C = [(r_{1}, c_{1}), (r_{2}, c_{2}), \dots, (r_{n}, c_{n})]$ with roles $r_{i} \in {system, user, assistant, function, observation}$ and content $c_{i}$ is encoded as a token sequence:

$tokens (C) = prefix \oplus ⨁_{i = 1}^{n} {format}_{r_{i}} (c_{i})$

where $\oplus$ denotes concatenation, $prefix$ includes any beginning-of-sequence tokens, and ${format}_{r_{i}}$ is the role-specific formatter that produces the appropriate token sequence.

Label Mask Construction

For supervised training, the label mask $L$ is constructed to train only on assistant responses:

Failed to parse (unknown function "\begin{cases}"): {\displaystyle L_t = \begin{cases} \text{token}_t & \text{if position } t \text{ is within an assistant response} \\ \text{IGNORE\_INDEX} & \text{otherwise} \end{cases} }

The template's encode_oneturn and encode_multiturn methods return separated prompt and response token sequences. The data processor then constructs the full input by concatenating these sequences and applies IGNORE_INDEX to all prompt positions.

Slot Resolution

Formatters produce slot sequences that mix literal text and special tokens. A slot is resolved to token IDs through:

If slot is a string:
    token_ids = tokenizer.encode(slot, add_special_tokens=False)
If slot is a dict {text: token_name}:
    token_ids = [tokenizer.convert_tokens_to_ids(token_name)]

This two-path resolution ensures that special tokens (like <|im_start|>) are encoded as single tokens rather than being tokenized as subword sequences.

Tool Encoding

Tool definitions are encoded into the system or user message using model-specific formats. The general structure is:

{
    "name": "function_name",
    "description": "What the function does",
    "parameters": {
        "type": "object",
        "properties": {
            "param1": {"type": "string", "description": "..."}
        },
        "required": ["param1"]
    }
}

Different tool formatters (default, GLM4, Llama-style) convert this JSON schema into the format expected by each model, handling differences in XML tags, markdown formatting, and parameter description styles.

Template Composition

Each template is defined by seven formatters:

Formatter	Role	Description
`format_user`	User	Encodes user messages with role prefix/suffix
`format_assistant`	Assistant	Encodes assistant responses with role prefix/suffix
`format_system`	System	Encodes system prompts
`format_function`	Function	Encodes function call results
`format_observation`	Observation	Encodes tool execution observations
`format_tools`	Tools	Encodes tool definitions into system context
`format_prefix`	Prefix	Beginning-of-sequence tokens and initial formatting

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment