Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Huggingface Alignment handbook Model Saving and Publishing

From Leeroopedia


Knowledge Sources
Domains NLP, Training, MLOps
Last Updated 2026-02-07 00:00 GMT

Overview

A model persistence pattern that saves trained model checkpoints locally and optionally publishes them to HuggingFace Hub with auto-generated model cards.

Description

Model Saving and Publishing is the final stage of every alignment-handbook training pipeline. After training completes, the model (or LoRA adapter) is saved to local disk, a model card is generated with training metadata, and the model is optionally pushed to HuggingFace Hub for sharing and deployment.

The alignment-handbook adds several important steps beyond basic saving:

  • EOS token alignment: The model's generation config is updated to match the tokenizer's EOS token, preventing unbounded generation in inference pipelines
  • KV cache restoration: After training (which disables cache for gradient checkpointing), the cache is re-enabled for efficient inference
  • Model card generation: An auto-generated model card with dataset name and alignment-handbook tag is created
  • Conditional hub push: Publishing to HuggingFace Hub is controlled by the push_to_hub config flag

Usage

Use this principle at the end of any training pipeline to persist the trained model. This is automatically handled by all alignment-handbook training scripts.

Theoretical Basis

Model persistence follows a defined sequence:

# Abstract save/publish flow (NOT real implementation)
# 1. Align generation config with tokenizer
model.generation_config.eos_token_id = tokenizer.eos_token_id
model.config.eos_token_id = tokenizer.eos_token_id

# 2. Save model/adapter to disk
trainer.save_model(output_dir)

# 3. On main process only:
trainer.create_model_card(tags=["alignment-handbook"])
model.config.use_cache = True  # Restore KV cache for inference
model.config.save_pretrained(output_dir)

# 4. Optionally push to Hub
if push_to_hub:
    trainer.push_to_hub(dataset_name=dataset_name)

For QLoRA models, save_model saves only the LoRA adapter weights (adapter_model.safetensors, adapter_config.json), not the full base model.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment