Heuristic:Huggingface Alignment handbook EOS Token Alignment
| Knowledge Sources | |
|---|---|
| Domains | LLMs, Debugging |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
After training, align the model's generation_config.eos_token_id with the tokenizer's eos_token_id to prevent unbounded generation in inference pipelines.
Description
The alignment-handbook's SFT script explicitly sets the model's generation config EOS token to match the tokenizer after training. This is critical because some models (especially those with custom chat templates) may have a mismatch between the model's default EOS token and the tokenizer's EOS token. Without this alignment, the model may fail to stop generating in HuggingFace's pipeline() function.
Usage
Apply this after every SFT training run before saving the model. This is already handled automatically in the alignment-handbook's sft.py script, but must be considered when building custom training pipelines or modifying the save logic.
The Insight (Rule of Thumb)
- Action: After training, explicitly set `model.generation_config.eos_token_id = tokenizer.eos_token_id` and `model.config.eos_token_id = tokenizer.eos_token_id` before saving.
- Value: Prevents infinite generation loops in inference.
- Trade-off: None. This is a zero-cost fix that prevents a critical inference bug.
Reasoning
Chat-tuned models often use custom EOS tokens (e.g., <|im_end|> for ChatML, for Mistral). If the model's generation_config still points to the base model's EOS token, the pipeline will not stop generation at the correct point.
Code evidence from `scripts/sft.py:134-138`:
# Align the model's generation config with the tokenizer's eos token
# to avoid unbounded generation in the transformers `pipeline()` function
trainer.model.generation_config.eos_token_id = tokenizer.eos_token_id
trainer.model.config.eos_token_id = tokenizer.eos_token_id
trainer.save_model(training_args.output_dir)
The comment in the code explicitly documents the motivation: avoid unbounded generation in the transformers pipeline() function.