Principle:Huggingface Trl SFT Model Saving
| Knowledge Sources | |
|---|---|
| Domains | NLP, Training |
| Last Updated | 2026-02-06 17:00 GMT |
Overview
Model persistence and distribution via checkpoint saving, model card generation, and HuggingFace Hub publishing after supervised fine-tuning.
Description
After training completes, the fine-tuned model must be saved to disk and optionally distributed to collaborators or deployed to production. The model saving phase handles several concerns:
- Weight serialization -- The model's learned parameters (or, for PEFT, just the adapter weights) are serialized to disk in a format that can be reloaded for inference or further training. HuggingFace models use the safetensors format by default, which provides safe, memory-mapped access to tensors.
- Tokenizer/processor saving -- The processing class (tokenizer or multimodal processor) is saved alongside the model so that the same vocabulary, special tokens, and chat template are available at inference time. This ensures reproducibility of tokenization.
- Configuration saving -- The model's
config.jsonand any PEFT adapter configuration are saved to enable architecture-aware reloading.
- Model card generation -- The
SFTTrainerautomatically generates a model card (a Markdown README) that documents the base model, dataset, training method, and library versions. This card is saved with every checkpoint and is displayed on the HuggingFace Hub when the model is published.
- Hub publishing -- When
push_to_hub=Truein the training configuration, the model, tokenizer, config, and model card are uploaded to the HuggingFace Hub, making the fine-tuned model immediately available for download and inference.
- Checkpoint management -- During training, checkpoints are saved at regular intervals (controlled by
save_strategyandsave_steps). Each checkpoint includes the model weights, optimizer state, scheduler state, and RNG state, enabling exact resumption of training.
Usage
Use this pattern when:
- Persisting a fine-tuned model to disk after training.
- Publishing a model to the HuggingFace Hub for distribution.
- Creating intermediate checkpoints during long training runs for fault tolerance.
- Saving PEFT adapter weights separately from the base model.
Theoretical Basis
Checkpoint = Snapshot of Training State: A checkpoint captures:
checkpoint = {
model_weights: theta_t, # Current model parameters
optimizer_state: {m_t, v_t}, # Momentum and variance (for Adam)
scheduler_state: lr_t, # Learning rate schedule position
rng_state: {cpu, gpu, numpy}, # Random number generator states
training_step: t, # Global step counter
epoch: e, # Current epoch
}
This allows exact reproduction of training state when resuming.
PEFT Adapter Saving: When using LoRA, only the adapter weights (A and B matrices plus any unfrozen modules) are saved. This is dramatically smaller than the full model:
Full model save: |theta| parameters (e.g., 7B = ~14 GB in fp16)
Adapter save: |A| + |B| parameters (e.g., ~20 MB for rank-16 LoRA)
The base model ID is stored in the adapter config, so at inference time the base model is loaded first and the adapter is applied on top.
Model Card as Documentation: The model card follows the HuggingFace model card specification and includes:
- Base model identification
- Training dataset name
- Library versions (TRL, Transformers, PyTorch, etc.)
- Training method tag (SFT)
- Optional links to training logs (Weights & Biases, Comet)
This metadata enables reproducibility and proper attribution in the model ecosystem.