Implementation:Huggingface Peft PeftModel Save Load
Overview
Concrete tool for saving and loading PEFT adapter weights provided by the PEFT library. The PeftModel.save_pretrained and PeftModel.from_pretrained methods implement the full lifecycle of adapter persistence -- serializing trained adapter weights and configuration to disk (or Hub), and reconstructing a PEFT-wrapped model from saved artifacts.
Description
These two methods form a complementary pair:
save_pretrained-- Extracts only the trainable adapter parameters from the model, serializes them to safetensors (or PyTorch .bin) format, and saves the adapter configuration as JSON. Supports saving selected adapters, handling tensor aliasing for safetensors compatibility, and converting specialized initialization methods (PiSSA, CorDA, OLoRA) to standard LoRA format.
from_pretrained-- Class method that loads a saved adapter configuration, creates the appropriatePeftModelsubclass around a base model, and injects the saved adapter weights. Supports loading from local directories or HuggingFace Hub model IDs, with options for trainable vs. inference mode, device mapping, and disk offloading.
Both methods handle the complete adapter lifecycle including model card generation, multi-adapter management, and integration with the HuggingFace ecosystem.
Usage
save_pretrained-- Call after training is complete to persist adapter weights. Can also be called during training for intermediate checkpoints.from_pretrained-- Call to load a previously saved adapter onto a base model for inference or continued training.
Code Reference
Source file: src/peft/peft_model.py, lines 190--605
Import:
from peft import PeftModel
save_pretrained signature:
def save_pretrained(
self,
save_directory: str,
safe_serialization: bool = True,
selected_adapters: Optional[list[str]] = None,
save_embedding_layers: Union[str, bool] = "auto",
is_main_process: bool = True,
path_initial_model_for_weight_conversion: Optional[str] = None,
**kwargs: Any,
) -> None:
from_pretrained signature:
@classmethod
def from_pretrained(
cls,
model: torch.nn.Module,
model_id: Union[str, os.PathLike],
adapter_name: str = "default",
is_trainable: bool = False,
config: Optional[PeftConfig] = None,
autocast_adapter_dtype: bool = True,
ephemeral_gpu_offload: bool = False,
low_cpu_mem_usage: bool = False,
**kwargs: Any,
) -> PeftModel:
I/O Contract
save_pretrained Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
save_directory |
str |
(required) | Directory path where adapter weights and config will be saved. Created if it does not exist. |
safe_serialization |
bool |
True |
Whether to save in safetensors format (recommended) or PyTorch .bin format.
|
selected_adapters |
Optional[list[str]] |
None |
List of adapter names to save. If None, saves all adapters.
|
save_embedding_layers |
Union[str, bool] |
"auto" |
Whether to include embedding layers in the saved weights. "auto" detects if embeddings are in target modules.
|
is_main_process |
bool |
True |
Controls whether this process performs the actual save. Set to False on non-main processes in distributed training.
|
path_initial_model_for_weight_conversion |
Optional[str] |
None |
Path to the initial adapter checkpoint for PiSSA/CorDA/OLoRA-to-LoRA conversion. When set, computes the delta between initial and trained weights. |
**kwargs |
Any |
-- | Additional keyword arguments passed to push_to_hub.
|
save_pretrained Output
| Type | Description |
|---|---|
None |
Writes files to save_directory: adapter_model.safetensors (or adapter_model.bin) and adapter_config.json. For non-default adapters, files are saved in a subdirectory named after the adapter.
|
from_pretrained Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
torch.nn.Module |
(required) | The base model to wrap with PEFT adapters. Typically a HuggingFace PreTrainedModel. May be modified in-place.
|
model_id |
Union[str, os.PathLike] |
(required) | Path to a local directory containing saved adapter files, or a HuggingFace Hub model ID. |
adapter_name |
str |
"default" |
Name to assign to the loaded adapter. |
is_trainable |
bool |
False |
If True, the adapter is loaded in training mode. If False, loaded in inference mode.
|
config |
Optional[PeftConfig] |
None |
Pre-loaded configuration object. If None, config is loaded from model_id.
|
autocast_adapter_dtype |
bool |
True |
Whether to cast adapter weights from float16/bfloat16 to float32 for stable training. |
ephemeral_gpu_offload |
bool |
False |
Enables on-demand GPU offloading for partially loaded modules to speed up operations while minimizing VRAM usage. |
low_cpu_mem_usage |
bool |
False |
Creates empty adapter weights on meta device before loading. Speeds up the loading process. |
**kwargs |
Any |
-- | Additional keyword arguments including subfolder, revision, cache_dir, token, and torch_device.
|
from_pretrained Output
| Type | Description |
|---|---|
PeftModel |
The base model wrapped with PEFT adapter layers and loaded weights. The specific subclass (e.g., PeftModelForCausalLM) depends on the task_type in the loaded configuration.
|
Usage Examples
Saving Adapter Weights After Training
from transformers import AutoModelForCausalLM, Trainer, TrainingArguments
from peft import get_peft_model, LoraConfig
# Setup model with adapter
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
)
model = get_peft_model(model, lora_config)
# ... train the model ...
# Save only the adapter weights (a few MB instead of 14+ GB)
model.save_pretrained("./my_lora_adapter")
# This creates:
# ./my_lora_adapter/adapter_config.json
# ./my_lora_adapter/adapter_model.safetensors
Saving Selected Adapters
# If the model has multiple adapters, save only specific ones
model.save_pretrained(
"./my_adapters",
selected_adapters=["task_a", "task_b"],
)
Loading an Adapter for Inference
from transformers import AutoModelForCausalLM
from peft import PeftModel
# Load the base model
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
# Load adapter from a local directory
model = PeftModel.from_pretrained(base_model, "./my_lora_adapter")
# Or load from the HuggingFace Hub
model = PeftModel.from_pretrained(base_model, "username/my-lora-adapter")
# The model is now ready for inference
model.eval()
Loading an Adapter for Continued Training
from transformers import AutoModelForCausalLM
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
# Set is_trainable=True to keep adapter weights trainable
model = PeftModel.from_pretrained(
base_model,
"./my_lora_adapter",
is_trainable=True,
)
# Continue training...
Pushing to the HuggingFace Hub
# Save locally first, then push
model.save_pretrained("./my_lora_adapter")
model.push_to_hub("username/my-lora-adapter")
# Or push directly
model.push_to_hub("username/my-lora-adapter")