Principle:Huggingface Alignment handbook Model Saving and Publishing
| Knowledge Sources | |
|---|---|
| Domains | NLP, Training, MLOps |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
A model persistence pattern that saves trained model checkpoints locally and optionally publishes them to HuggingFace Hub with auto-generated model cards.
Description
Model Saving and Publishing is the final stage of every alignment-handbook training pipeline. After training completes, the model (or LoRA adapter) is saved to local disk, a model card is generated with training metadata, and the model is optionally pushed to HuggingFace Hub for sharing and deployment.
The alignment-handbook adds several important steps beyond basic saving:
- EOS token alignment: The model's generation config is updated to match the tokenizer's EOS token, preventing unbounded generation in inference pipelines
- KV cache restoration: After training (which disables cache for gradient checkpointing), the cache is re-enabled for efficient inference
- Model card generation: An auto-generated model card with dataset name and alignment-handbook tag is created
- Conditional hub push: Publishing to HuggingFace Hub is controlled by the push_to_hub config flag
Usage
Use this principle at the end of any training pipeline to persist the trained model. This is automatically handled by all alignment-handbook training scripts.
Theoretical Basis
Model persistence follows a defined sequence:
# Abstract save/publish flow (NOT real implementation)
# 1. Align generation config with tokenizer
model.generation_config.eos_token_id = tokenizer.eos_token_id
model.config.eos_token_id = tokenizer.eos_token_id
# 2. Save model/adapter to disk
trainer.save_model(output_dir)
# 3. On main process only:
trainer.create_model_card(tags=["alignment-handbook"])
model.config.use_cache = True # Restore KV cache for inference
model.config.save_pretrained(output_dir)
# 4. Optionally push to Hub
if push_to_hub:
trainer.push_to_hub(dataset_name=dataset_name)
For QLoRA models, save_model saves only the LoRA adapter weights (adapter_model.safetensors, adapter_config.json), not the full base model.