Principle:Huggingface Alignment handbook Chat Template Configuration

Knowledge Sources	Alignment Handbook HuggingFace Chat Templating TRL setup_chat_format
Domains	NLP, Preprocessing
Last Updated	2026-02-07 00:00 GMT

Overview

A conversation formatting mechanism that applies Jinja2 chat templates to structure multi-turn conversations into the token format expected by the model during alignment training.

Description

Chat Template Configuration ensures that conversations are correctly formatted before tokenization. Different models expect different conversation formats (ChatML, Llama format, custom templates), and the chat template defines how role and content fields in conversation data map to special tokens and text formatting.

In the alignment-handbook, chat templates are handled at two levels:

Tokenizer-level: The get_tokenizer function can override the tokenizer's default template with a custom Jinja2 template from the training config
Fallback-level: If no chat template is found on the tokenizer, the SFT script falls back to ChatML format using TRL's setup_chat_format

For advanced models like SmolLM3, custom chat templates support thinking modes with special tokens (<|thinking|>, <|/thinking|>) that control whether the model uses chain-of-thought reasoning.

Usage

Use chat template configuration when:

The model's default chat template does not match the training data format
A custom conversation format with special tokens is needed (e.g., thinking modes)
Base models without any chat template need ChatML format applied
Consistent formatting is needed across SFT and DPO stages

Theoretical Basis

Chat templates use Jinja2 syntax to transform structured conversations:

# Abstract chat template flow (NOT real implementation)
# Input: list of message dicts
messages = [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi there!"},
]

# Template converts to formatted string
# ChatML format:
# <|im_start|>system\nYou are helpful.<|im_end|>\n
# <|im_start|>user\nHello<|im_end|>\n
# <|im_start|>assistant\nHi there!<|im_end|>\n

formatted = tokenizer.apply_chat_template(messages, tokenize=False)

The setup_chat_format function adds ChatML special tokens to both the tokenizer vocabulary and the model's embedding layer, ensuring the model can process the new tokens.

Related Pages

Implemented By

Implementation:Huggingface_Alignment_handbook_Setup_Chat_Format

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment