Implementation:Hiyouga LLaMA Factory Chat Template

Knowledge Sources	Hiyouga_LLaMA_Factory
Domains	Prompt Engineering, Tokenization
Last Updated	2026-02-06 19:00 GMT

Overview

Concrete chat template definitions and tokenization encoding system for 100+ model architectures provided by LLaMA Factory.

Description

This module defines the Template dataclass, which is the core abstraction for converting structured chat messages into tokenized sequences. Each template encapsulates role-specific formatters (user, assistant, system, function, observation, tools, prefix), stop words, thought/reasoning delimiters, efficient EOS handling, and a reference to the appropriate multimodal plugin.

The Template class provides:

encode_oneturn -- Encodes a full conversation into a single (prompt_ids, response_ids) pair for training/inference
encode_multiturn -- Encodes a conversation into multiple (prompt, response) pairs for multi-turn training
extract_tool -- Parses tool/function calls from model output using the template's tool formatter
get_stop_token_ids -- Returns all stop token IDs including EOS and stop words
add_thought / remove_thought -- Manages reasoning/thinking tokens for chain-of-thought models
fix_special_tokens -- Patches tokenizer EOS, PAD, and stop word tokens
fix_jinja_template -- Generates or replaces the tokenizer's Jinja chat template

Two specialized subclasses exist:

Llama2Template -- Handles LLaMA-2's unique system message embedding within the first user turn
ReasoningTemplate -- Adds thought word token IDs to the beginning of assistant messages for reasoning models

Over 100 templates are registered via register_template for models including LLaMA, Qwen, ChatGLM, Mistral, Gemma, DeepSeek, Phi, Yi, InternLM, Baichuan, and many others. The get_template_and_fix_tokenizer function selects and initializes the correct template.

Usage

Templates are obtained via get_template_and_fix_tokenizer at the start of any training or inference workflow. The template is then used throughout the data pipeline for encoding examples and during inference for prompt construction.

Code Reference

Source Location

Repository: Hiyouga_LLaMA_Factory
File: src/llamafactory/data/template.py
Lines: 1-2175

Signature

@dataclass
class Template:
    format_user: "Formatter"
    format_assistant: "Formatter"
    format_system: "Formatter"
    format_function: "Formatter"
    format_observation: "Formatter"
    format_tools: "Formatter"
    format_prefix: "Formatter"
    default_system: str
    stop_words: list[str]
    thought_words: tuple[str, str]
    tool_call_words: tuple[str, str]
    efficient_eos: bool
    replace_eos: bool
    replace_jinja_template: bool
    enable_thinking: Optional[bool]
    mm_plugin: "BasePlugin"

    def encode_oneturn(self, tokenizer, messages, system=None, tools=None) -> tuple[list[int], list[int]]: ...
    def encode_multiturn(self, tokenizer, messages, system=None, tools=None) -> list[tuple[list[int], list[int]]]: ...
    def extract_tool(self, content: str) -> Union[str, list["FunctionCall"]]: ...
    def get_stop_token_ids(self, tokenizer) -> list[int]: ...
    def fix_special_tokens(self, tokenizer) -> None: ...
    def fix_jinja_template(self, tokenizer) -> None: ...

@dataclass
class Llama2Template(Template): ...

@dataclass
class ReasoningTemplate(Template): ...

def register_template(name: str, ...) -> None: ...
def get_template_and_fix_tokenizer(tokenizer, data_args) -> "Template": ...

Import

from llamafactory.data.template import Template, get_template_and_fix_tokenizer

I/O Contract

Inputs (encode_oneturn)

Name	Type	Required	Description
tokenizer	PreTrainedTokenizer	Yes	Tokenizer for converting text to token IDs
messages	list[dict[str, str]]	Yes	Chat messages with role ("user", "assistant", "observation", "function") and content
system	str	No	System prompt override (defaults to template's default_system)
tools	str	No	JSON string of tool definitions

Outputs (encode_oneturn)

Name	Type	Description
prompt_ids	list[int]	Token IDs for the prompt (all messages except last)
response_ids	list[int]	Token IDs for the response (last message)

Template Registration

Templates are registered using register_template with the following parameters:

Parameter	Type	Description
name	str	Template identifier (e.g., "llama3", "qwen", "chatglm4")
format_user	Formatter	Formatter for user messages (typically StringFormatter with model-specific tokens)
format_assistant	Formatter	Formatter for assistant messages
format_system	Formatter	Formatter for system messages
format_function	Formatter	Formatter for function call messages
format_observation	Formatter	Formatter for tool observation messages
format_tools	Formatter	Formatter for tool definitions (uses ToolFormatter)
default_system	str	Default system prompt
stop_words	list[str]	Additional stop tokens beyond EOS
mm_plugin	str	Name of the multimodal plugin to use

Usage Examples

from llamafactory.data import get_template_and_fix_tokenizer

# Get template and fix tokenizer special tokens
template = get_template_and_fix_tokenizer(tokenizer, data_args)

# Encode a single conversation turn
prompt_ids, response_ids = template.encode_oneturn(
    tokenizer,
    messages=[
        {"role": "user", "content": "What is 2+2?"},
        {"role": "assistant", "content": "4"},
    ],
    system="You are a helpful assistant.",
)

# Encode multiple turns for multi-turn training
pairs = template.encode_multiturn(
    tokenizer,
    messages=[
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "How are you?"},
        {"role": "assistant", "content": "I'm great!"},
    ],
)

# Extract tool calls from model output
result = template.extract_tool("Action: search\nAction Input: {\"query\": \"weather\"}")
# Returns: [FunctionCall(name="search", arguments='{"query": "weather"}')]

Related Pages

Hiyouga_LLaMA_Factory_Multimodal_Plugin - MM plugin referenced by each template
Hiyouga_LLaMA_Factory_Tool_Utils - Tool formatting utilities used by ToolFormatter
Hiyouga_LLaMA_Factory_Data_Converter - Converters that produce the standardized messages consumed by templates
Hiyouga_LLaMA_Factory_HfChatEngine - Inference engine that uses templates for prompt encoding

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment