Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Huggingface Open r1 Model Saving and Publishing

From Leeroopedia


Knowledge Sources
Domains NLP, Infrastructure
Last Updated 2026-02-08 00:00 GMT

Overview

A model persistence mechanism that saves trained model weights, configuration, and metadata locally and optionally publishes them to HuggingFace Hub with model cards and version tags.

Description

After training completes, models must be saved for inference and sharing. This principle covers:

  • Saving model weights and tokenizer configuration to local disk via the Trainer's built-in save_model method.
  • Creating model cards with training metadata (dataset name, tags) to document the model's provenance and intended use.
  • Restoring the KV cache for inference efficiency (disabled during gradient checkpointing training) by setting use_cache = True after saving.
  • Aligning generation config with tokenizer EOS token to prevent unbounded generation by ensuring the model knows when to stop.
  • Pushing to HuggingFace Hub with proper tags and metadata for public or private sharing.

Open-R1 also supports per-checkpoint revision pushing via callbacks, enabling fine-grained model selection post-training.

Usage

Use this principle at the end of any training script to persist the trained model and optionally share it on the Hub. It applies to both SFT and GRPO training pipelines and should be invoked after the training loop has completed and (optionally) after final evaluation.

Theoretical Basis

Save Pipeline

The model saving and publishing pipeline follows a well-defined sequence that ensures the saved model is ready for inference and properly documented. The key steps are: align the generation config, save the model, create a model card on the main process, restore the KV cache, and optionally push to the Hub.

PROCEDURE SaveAndPublishModel(trainer, tokenizer, output_dir, dataset_name, push_to_hub):
    // Step 1: Align generation config EOS with tokenizer EOS
    trainer.model.generation_config.eos_token_id = tokenizer.eos_token_id

    // Step 2: Save model weights and tokenizer to disk
    trainer.save_model(output_dir)

    // Step 3: On main process only, create model card and restore KV cache
    IF is_main_process:
        kwargs = {dataset_name: dataset_name, tags: ["open-r1"]}
        trainer.create_model_card(**kwargs)
        trainer.model.config.use_cache = True    // Restore KV cache for inference
        trainer.model.config.save_pretrained(output_dir)

    // Step 4: Optionally push to HuggingFace Hub
    IF push_to_hub:
        trainer.push_to_hub(**kwargs)

EOS Token Alignment

The generation config's eos_token_id must match the tokenizer's EOS token so that the model stops generating when the end-of-sequence token is produced. Without this alignment, the model may generate text indefinitely or stop prematurely if a default or stale EOS token ID is used.

KV Cache Restoration

During training with gradient checkpointing, the KV cache is disabled to save memory (since gradient checkpointing recomputes activations). After training, the KV cache must be re-enabled (use_cache = True) so that inference benefits from caching intermediate key-value computations, avoiding redundant calculations during autoregressive generation.

Model Card Creation

Model cards provide essential documentation about a model's training data, intended use, and limitations. Creating the model card on the main process only avoids race conditions in distributed training setups. The card includes the dataset name and tags (e.g., "open-r1") for discoverability on the Hub.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment