Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hiyouga LLaMA Factory Chat Template

From Leeroopedia


Knowledge Sources
Domains Prompt Engineering, Tokenization
Last Updated 2026-02-06 19:00 GMT

Overview

Concrete chat template definitions and tokenization encoding system for 100+ model architectures provided by LLaMA Factory.

Description

This module defines the Template dataclass, which is the core abstraction for converting structured chat messages into tokenized sequences. Each template encapsulates role-specific formatters (user, assistant, system, function, observation, tools, prefix), stop words, thought/reasoning delimiters, efficient EOS handling, and a reference to the appropriate multimodal plugin.

The Template class provides:

  • encode_oneturn -- Encodes a full conversation into a single (prompt_ids, response_ids) pair for training/inference
  • encode_multiturn -- Encodes a conversation into multiple (prompt, response) pairs for multi-turn training
  • extract_tool -- Parses tool/function calls from model output using the template's tool formatter
  • get_stop_token_ids -- Returns all stop token IDs including EOS and stop words
  • add_thought / remove_thought -- Manages reasoning/thinking tokens for chain-of-thought models
  • fix_special_tokens -- Patches tokenizer EOS, PAD, and stop word tokens
  • fix_jinja_template -- Generates or replaces the tokenizer's Jinja chat template

Two specialized subclasses exist:

  • Llama2Template -- Handles LLaMA-2's unique system message embedding within the first user turn
  • ReasoningTemplate -- Adds thought word token IDs to the beginning of assistant messages for reasoning models

Over 100 templates are registered via register_template for models including LLaMA, Qwen, ChatGLM, Mistral, Gemma, DeepSeek, Phi, Yi, InternLM, Baichuan, and many others. The get_template_and_fix_tokenizer function selects and initializes the correct template.

Usage

Templates are obtained via get_template_and_fix_tokenizer at the start of any training or inference workflow. The template is then used throughout the data pipeline for encoding examples and during inference for prompt construction.

Code Reference

Source Location

Signature

@dataclass
class Template:
    format_user: "Formatter"
    format_assistant: "Formatter"
    format_system: "Formatter"
    format_function: "Formatter"
    format_observation: "Formatter"
    format_tools: "Formatter"
    format_prefix: "Formatter"
    default_system: str
    stop_words: list[str]
    thought_words: tuple[str, str]
    tool_call_words: tuple[str, str]
    efficient_eos: bool
    replace_eos: bool
    replace_jinja_template: bool
    enable_thinking: Optional[bool]
    mm_plugin: "BasePlugin"

    def encode_oneturn(self, tokenizer, messages, system=None, tools=None) -> tuple[list[int], list[int]]: ...
    def encode_multiturn(self, tokenizer, messages, system=None, tools=None) -> list[tuple[list[int], list[int]]]: ...
    def extract_tool(self, content: str) -> Union[str, list["FunctionCall"]]: ...
    def get_stop_token_ids(self, tokenizer) -> list[int]: ...
    def fix_special_tokens(self, tokenizer) -> None: ...
    def fix_jinja_template(self, tokenizer) -> None: ...

@dataclass
class Llama2Template(Template): ...

@dataclass
class ReasoningTemplate(Template): ...

def register_template(name: str, ...) -> None: ...
def get_template_and_fix_tokenizer(tokenizer, data_args) -> "Template": ...

Import

from llamafactory.data.template import Template, get_template_and_fix_tokenizer

I/O Contract

Inputs (encode_oneturn)

Name Type Required Description
tokenizer PreTrainedTokenizer Yes Tokenizer for converting text to token IDs
messages list[dict[str, str]] Yes Chat messages with role ("user", "assistant", "observation", "function") and content
system str No System prompt override (defaults to template's default_system)
tools str No JSON string of tool definitions

Outputs (encode_oneturn)

Name Type Description
prompt_ids list[int] Token IDs for the prompt (all messages except last)
response_ids list[int] Token IDs for the response (last message)

Template Registration

Templates are registered using register_template with the following parameters:

Parameter Type Description
name str Template identifier (e.g., "llama3", "qwen", "chatglm4")
format_user Formatter Formatter for user messages (typically StringFormatter with model-specific tokens)
format_assistant Formatter Formatter for assistant messages
format_system Formatter Formatter for system messages
format_function Formatter Formatter for function call messages
format_observation Formatter Formatter for tool observation messages
format_tools Formatter Formatter for tool definitions (uses ToolFormatter)
default_system str Default system prompt
stop_words list[str] Additional stop tokens beyond EOS
mm_plugin str Name of the multimodal plugin to use

Usage Examples

from llamafactory.data import get_template_and_fix_tokenizer

# Get template and fix tokenizer special tokens
template = get_template_and_fix_tokenizer(tokenizer, data_args)

# Encode a single conversation turn
prompt_ids, response_ids = template.encode_oneturn(
    tokenizer,
    messages=[
        {"role": "user", "content": "What is 2+2?"},
        {"role": "assistant", "content": "4"},
    ],
    system="You are a helpful assistant.",
)

# Encode multiple turns for multi-turn training
pairs = template.encode_multiturn(
    tokenizer,
    messages=[
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "How are you?"},
        {"role": "assistant", "content": "I'm great!"},
    ],
)

# Extract tool calls from model output
result = template.extract_tool("Action: search\nAction Input: {\"query\": \"weather\"}")
# Returns: [FunctionCall(name="search", arguments='{"query": "weather"}')]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment