Implementation:Huggingface Transformers Save Pretrained

Knowledge Sources	Transformers Transformers Docs
Domains	NLP, Training, MLOps
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete tool for persisting a trained model's weights, configuration, and associated artifacts to disk or the HuggingFace Hub, provided by the HuggingFace Transformers library.

Description

PreTrainedModel.save_pretrained() serializes the model's state dictionary and configuration to a specified directory. The saved artifacts can later be reloaded using from_pretrained(). By default, weights are saved in the safetensors format for safety and performance, with automatic sharding for models that exceed the shard size threshold (default: 50GB).

The method handles several advanced scenarios:

Distributed training -- Only the main process writes files to avoid race conditions.
PEFT models -- Adapter weights are saved in PEFT-compatible format when detected.
Quantized models -- Serializable quantization states are preserved.
Tensor parallelism -- Models distributed across devices are gathered before saving.
Hub push -- Optionally pushes saved artifacts directly to the HuggingFace Hub.

In addition to the model weights, save_pretrained() also saves the model's configuration (config.json), generation configuration (generation_config.json if applicable), and custom model code if the model was loaded with trust_remote_code.

Usage

Call model.save_pretrained() after training completes to save the final model, or use it within a Trainer checkpoint callback. Also useful when you want to save a model to a local directory before pushing to the Hub separately.

Code Reference

Source Location

Repository: transformers
File: src/transformers/modeling_utils.py (lines 3125-3244+)

Signature

def save_pretrained(
    self,
    save_directory: str | os.PathLike,
    is_main_process: bool = True,
    state_dict: dict | None = None,
    push_to_hub: bool = False,
    max_shard_size: int | str = "50GB",
    variant: str | None = None,
    token: str | bool | None = None,
    save_peft_format: bool = True,
    save_original_format: bool = True,
    **kwargs,
):

Import

from transformers import AutoModelForCausalLM
# save_pretrained() is an instance method on any PreTrainedModel

I/O Contract

Inputs

Name	Type	Required	Description
save_directory	str or os.PathLike	Yes	Directory path where the model will be saved. Created if it does not exist
is_main_process	bool	No	Whether this is the main process in distributed training (default: True). Only the main process writes files to avoid race conditions
state_dict	dict	No	Custom state dictionary to save. If None, uses self.state_dict(). Useful for saving only parts of the model
push_to_hub	bool	No	Whether to push the saved model to the HuggingFace Hub after saving (default: False)
max_shard_size	int or str	No	Maximum size per checkpoint shard file (default: "50GB"). Specified as an integer (bytes) or string like "5GB"
variant	str	No	If specified, weights are saved as model.{variant}.safetensors
token	str or bool	No	Authentication token for pushing to the Hub
save_peft_format	bool	No	Save adapter weights in PEFT-compatible format (default: True)
save_original_format	bool	No	Save checkpoint with reverse mapping for backward compatibility (default: True)
**kwargs	dict	No	Additional arguments passed to push_to_hub(), including repo_id and commit_message

Outputs

Name	Type	Description
(files on disk)	None	Writes model.safetensors (or sharded files with index), config.json, and optionally generation_config.json to save_directory. No Python return value.

Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("gpt2")
# ... training happens here ...

model.save_pretrained("./my_fine_tuned_model")

Saving with Tokenizer

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Save both model and tokenizer together
save_dir = "./my_model"
model.save_pretrained(save_dir)
tokenizer.save_pretrained(save_dir)

Saving and Pushing to Hub

model.save_pretrained(
    "./my_model",
    push_to_hub=True,
    repo_id="username/my-fine-tuned-gpt2",
    commit_message="Upload fine-tuned model",
)

Using Trainer's Built-in Save and Push

from transformers import Trainer

# After training, Trainer wraps save_pretrained internally
trainer.save_model("./my_model")

# Or push directly to Hub
trainer.push_to_hub(commit_message="End of training")

Saving Large Sharded Models

# Save a large model with smaller shards
model.save_pretrained(
    "./my_large_model",
    max_shard_size="5GB",
)
# Creates: model-00001-of-00003.safetensors, ..., model.safetensors.index.json

Related Pages

Implements Principle

Principle:Huggingface_Transformers_Model_Saving

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment