Implementation:Huggingface Transformers Save Pretrained
| Knowledge Sources | |
|---|---|
| Domains | NLP, Training, MLOps |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete tool for persisting a trained model's weights, configuration, and associated artifacts to disk or the HuggingFace Hub, provided by the HuggingFace Transformers library.
Description
PreTrainedModel.save_pretrained() serializes the model's state dictionary and configuration to a specified directory. The saved artifacts can later be reloaded using from_pretrained(). By default, weights are saved in the safetensors format for safety and performance, with automatic sharding for models that exceed the shard size threshold (default: 50GB).
The method handles several advanced scenarios:
- Distributed training -- Only the main process writes files to avoid race conditions.
- PEFT models -- Adapter weights are saved in PEFT-compatible format when detected.
- Quantized models -- Serializable quantization states are preserved.
- Tensor parallelism -- Models distributed across devices are gathered before saving.
- Hub push -- Optionally pushes saved artifacts directly to the HuggingFace Hub.
In addition to the model weights, save_pretrained() also saves the model's configuration (config.json), generation configuration (generation_config.json if applicable), and custom model code if the model was loaded with trust_remote_code.
Usage
Call model.save_pretrained() after training completes to save the final model, or use it within a Trainer checkpoint callback. Also useful when you want to save a model to a local directory before pushing to the Hub separately.
Code Reference
Source Location
- Repository: transformers
- File: src/transformers/modeling_utils.py (lines 3125-3244+)
Signature
def save_pretrained(
self,
save_directory: str | os.PathLike,
is_main_process: bool = True,
state_dict: dict | None = None,
push_to_hub: bool = False,
max_shard_size: int | str = "50GB",
variant: str | None = None,
token: str | bool | None = None,
save_peft_format: bool = True,
save_original_format: bool = True,
**kwargs,
):
Import
from transformers import AutoModelForCausalLM
# save_pretrained() is an instance method on any PreTrainedModel
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| save_directory | str or os.PathLike | Yes | Directory path where the model will be saved. Created if it does not exist |
| is_main_process | bool | No | Whether this is the main process in distributed training (default: True). Only the main process writes files to avoid race conditions |
| state_dict | dict | No | Custom state dictionary to save. If None, uses self.state_dict(). Useful for saving only parts of the model |
| push_to_hub | bool | No | Whether to push the saved model to the HuggingFace Hub after saving (default: False) |
| max_shard_size | int or str | No | Maximum size per checkpoint shard file (default: "50GB"). Specified as an integer (bytes) or string like "5GB" |
| variant | str | No | If specified, weights are saved as model.{variant}.safetensors |
| token | str or bool | No | Authentication token for pushing to the Hub |
| save_peft_format | bool | No | Save adapter weights in PEFT-compatible format (default: True) |
| save_original_format | bool | No | Save checkpoint with reverse mapping for backward compatibility (default: True) |
| **kwargs | dict | No | Additional arguments passed to push_to_hub(), including repo_id and commit_message |
Outputs
| Name | Type | Description |
|---|---|---|
| (files on disk) | None | Writes model.safetensors (or sharded files with index), config.json, and optionally generation_config.json to save_directory. No Python return value. |
Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("gpt2")
# ... training happens here ...
model.save_pretrained("./my_fine_tuned_model")
Saving with Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
# Save both model and tokenizer together
save_dir = "./my_model"
model.save_pretrained(save_dir)
tokenizer.save_pretrained(save_dir)
Saving and Pushing to Hub
model.save_pretrained(
"./my_model",
push_to_hub=True,
repo_id="username/my-fine-tuned-gpt2",
commit_message="Upload fine-tuned model",
)
Using Trainer's Built-in Save and Push
from transformers import Trainer
# After training, Trainer wraps save_pretrained internally
trainer.save_model("./my_model")
# Or push directly to Hub
trainer.push_to_hub(commit_message="End of training")
Saving Large Sharded Models
# Save a large model with smaller shards
model.save_pretrained(
"./my_large_model",
max_shard_size="5GB",
)
# Creates: model-00001-of-00003.safetensors, ..., model.safetensors.index.json