Principle:Huggingface Alignment handbook Chat Template Configuration
| Knowledge Sources | |
|---|---|
| Domains | NLP, Preprocessing |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
A conversation formatting mechanism that applies Jinja2 chat templates to structure multi-turn conversations into the token format expected by the model during alignment training.
Description
Chat Template Configuration ensures that conversations are correctly formatted before tokenization. Different models expect different conversation formats (ChatML, Llama format, custom templates), and the chat template defines how role and content fields in conversation data map to special tokens and text formatting.
In the alignment-handbook, chat templates are handled at two levels:
- Tokenizer-level: The get_tokenizer function can override the tokenizer's default template with a custom Jinja2 template from the training config
- Fallback-level: If no chat template is found on the tokenizer, the SFT script falls back to ChatML format using TRL's setup_chat_format
For advanced models like SmolLM3, custom chat templates support thinking modes with special tokens (<|thinking|>, <|/thinking|>) that control whether the model uses chain-of-thought reasoning.
Usage
Use chat template configuration when:
- The model's default chat template does not match the training data format
- A custom conversation format with special tokens is needed (e.g., thinking modes)
- Base models without any chat template need ChatML format applied
- Consistent formatting is needed across SFT and DPO stages
Theoretical Basis
Chat templates use Jinja2 syntax to transform structured conversations:
# Abstract chat template flow (NOT real implementation)
# Input: list of message dicts
messages = [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there!"},
]
# Template converts to formatted string
# ChatML format:
# <|im_start|>system\nYou are helpful.<|im_end|>\n
# <|im_start|>user\nHello<|im_end|>\n
# <|im_start|>assistant\nHi there!<|im_end|>\n
formatted = tokenizer.apply_chat_template(messages, tokenize=False)
The setup_chat_format function adds ChatML special tokens to both the tokenizer vocabulary and the model's embedding layer, ensuring the model can process the new tokens.