Principle:Huggingface Open r1 Model Saving and Publishing
| Knowledge Sources | |
|---|---|
| Domains | NLP, Infrastructure |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
A model persistence mechanism that saves trained model weights, configuration, and metadata locally and optionally publishes them to HuggingFace Hub with model cards and version tags.
Description
After training completes, models must be saved for inference and sharing. This principle covers:
- Saving model weights and tokenizer configuration to local disk via the Trainer's built-in
save_modelmethod. - Creating model cards with training metadata (dataset name, tags) to document the model's provenance and intended use.
- Restoring the KV cache for inference efficiency (disabled during gradient checkpointing training) by setting
use_cache = Trueafter saving. - Aligning generation config with tokenizer EOS token to prevent unbounded generation by ensuring the model knows when to stop.
- Pushing to HuggingFace Hub with proper tags and metadata for public or private sharing.
Open-R1 also supports per-checkpoint revision pushing via callbacks, enabling fine-grained model selection post-training.
Usage
Use this principle at the end of any training script to persist the trained model and optionally share it on the Hub. It applies to both SFT and GRPO training pipelines and should be invoked after the training loop has completed and (optionally) after final evaluation.
Theoretical Basis
Save Pipeline
The model saving and publishing pipeline follows a well-defined sequence that ensures the saved model is ready for inference and properly documented. The key steps are: align the generation config, save the model, create a model card on the main process, restore the KV cache, and optionally push to the Hub.
PROCEDURE SaveAndPublishModel(trainer, tokenizer, output_dir, dataset_name, push_to_hub):
// Step 1: Align generation config EOS with tokenizer EOS
trainer.model.generation_config.eos_token_id = tokenizer.eos_token_id
// Step 2: Save model weights and tokenizer to disk
trainer.save_model(output_dir)
// Step 3: On main process only, create model card and restore KV cache
IF is_main_process:
kwargs = {dataset_name: dataset_name, tags: ["open-r1"]}
trainer.create_model_card(**kwargs)
trainer.model.config.use_cache = True // Restore KV cache for inference
trainer.model.config.save_pretrained(output_dir)
// Step 4: Optionally push to HuggingFace Hub
IF push_to_hub:
trainer.push_to_hub(**kwargs)
EOS Token Alignment
The generation config's eos_token_id must match the tokenizer's EOS token so that the model stops generating when the end-of-sequence token is produced. Without this alignment, the model may generate text indefinitely or stop prematurely if a default or stale EOS token ID is used.
KV Cache Restoration
During training with gradient checkpointing, the KV cache is disabled to save memory (since gradient checkpointing recomputes activations). After training, the KV cache must be re-enabled (use_cache = True) so that inference benefits from caching intermediate key-value computations, avoiding redundant calculations during autoregressive generation.
Model Card Creation
Model cards provide essential documentation about a model's training data, intended use, and limitations. Creating the model card on the main process only avoids race conditions in distributed training setups. The card includes the dataset name and tags (e.g., "open-r1") for discoverability on the Hub.