Implementation:Huggingface Peft PeftModel Save Load

Overview

Concrete tool for saving and loading PEFT adapter weights provided by the PEFT library. The PeftModel.save_pretrained and PeftModel.from_pretrained methods implement the full lifecycle of adapter persistence -- serializing trained adapter weights and configuration to disk (or Hub), and reconstructing a PEFT-wrapped model from saved artifacts.

Description

These two methods form a complementary pair:

save_pretrained -- Extracts only the trainable adapter parameters from the model, serializes them to safetensors (or PyTorch .bin) format, and saves the adapter configuration as JSON. Supports saving selected adapters, handling tensor aliasing for safetensors compatibility, and converting specialized initialization methods (PiSSA, CorDA, OLoRA) to standard LoRA format.

from_pretrained -- Class method that loads a saved adapter configuration, creates the appropriate PeftModel subclass around a base model, and injects the saved adapter weights. Supports loading from local directories or HuggingFace Hub model IDs, with options for trainable vs. inference mode, device mapping, and disk offloading.

Both methods handle the complete adapter lifecycle including model card generation, multi-adapter management, and integration with the HuggingFace ecosystem.

Usage

save_pretrained -- Call after training is complete to persist adapter weights. Can also be called during training for intermediate checkpoints.
from_pretrained -- Call to load a previously saved adapter onto a base model for inference or continued training.

Code Reference

Source file: src/peft/peft_model.py, lines 190--605

Import:

from peft import PeftModel

save_pretrained signature:

def save_pretrained(
    self,
    save_directory: str,
    safe_serialization: bool = True,
    selected_adapters: Optional[list[str]] = None,
    save_embedding_layers: Union[str, bool] = "auto",
    is_main_process: bool = True,
    path_initial_model_for_weight_conversion: Optional[str] = None,
    **kwargs: Any,
) -> None:

from_pretrained signature:

@classmethod
def from_pretrained(
    cls,
    model: torch.nn.Module,
    model_id: Union[str, os.PathLike],
    adapter_name: str = "default",
    is_trainable: bool = False,
    config: Optional[PeftConfig] = None,
    autocast_adapter_dtype: bool = True,
    ephemeral_gpu_offload: bool = False,
    low_cpu_mem_usage: bool = False,
    **kwargs: Any,
) -> PeftModel:

I/O Contract

`save_pretrained` Inputs

Parameter	Type	Default	Description
`save_directory`	`str`	(required)	Directory path where adapter weights and config will be saved. Created if it does not exist.
`safe_serialization`	`bool`	`True`	Whether to save in safetensors format (recommended) or PyTorch `.bin` format.
`selected_adapters`	`Optional[list[str]]`	`None`	List of adapter names to save. If `None`, saves all adapters.
`save_embedding_layers`	`Union[str, bool]`	`"auto"`	Whether to include embedding layers in the saved weights. `"auto"` detects if embeddings are in target modules.
`is_main_process`	`bool`	`True`	Controls whether this process performs the actual save. Set to `False` on non-main processes in distributed training.
`path_initial_model_for_weight_conversion`	`Optional[str]`	`None`	Path to the initial adapter checkpoint for PiSSA/CorDA/OLoRA-to-LoRA conversion. When set, computes the delta between initial and trained weights.
`**kwargs`	`Any`	--	Additional keyword arguments passed to `push_to_hub`.

`save_pretrained` Output

Type	Description
`None`	Writes files to `save_directory`: `adapter_model.safetensors` (or `adapter_model.bin`) and `adapter_config.json`. For non-default adapters, files are saved in a subdirectory named after the adapter.

`from_pretrained` Inputs

Parameter	Type	Default	Description
`model`	`torch.nn.Module`	(required)	The base model to wrap with PEFT adapters. Typically a HuggingFace `PreTrainedModel`. May be modified in-place.
`model_id`	`Union[str, os.PathLike]`	(required)	Path to a local directory containing saved adapter files, or a HuggingFace Hub model ID.
`adapter_name`	`str`	`"default"`	Name to assign to the loaded adapter.
`is_trainable`	`bool`	`False`	If `True`, the adapter is loaded in training mode. If `False`, loaded in inference mode.
`config`	`Optional[PeftConfig]`	`None`	Pre-loaded configuration object. If `None`, config is loaded from `model_id`.
`autocast_adapter_dtype`	`bool`	`True`	Whether to cast adapter weights from float16/bfloat16 to float32 for stable training.
`ephemeral_gpu_offload`	`bool`	`False`	Enables on-demand GPU offloading for partially loaded modules to speed up operations while minimizing VRAM usage.
`low_cpu_mem_usage`	`bool`	`False`	Creates empty adapter weights on meta device before loading. Speeds up the loading process.
`**kwargs`	`Any`	--	Additional keyword arguments including `subfolder`, `revision`, `cache_dir`, `token`, and `torch_device`.

`from_pretrained` Output

Type	Description
`PeftModel`	The base model wrapped with PEFT adapter layers and loaded weights. The specific subclass (e.g., `PeftModelForCausalLM`) depends on the `task_type` in the loaded configuration.

Usage Examples

Saving Adapter Weights After Training

from transformers import AutoModelForCausalLM, Trainer, TrainingArguments
from peft import get_peft_model, LoraConfig

# Setup model with adapter
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
)
model = get_peft_model(model, lora_config)

# ... train the model ...

# Save only the adapter weights (a few MB instead of 14+ GB)
model.save_pretrained("./my_lora_adapter")

# This creates:
#   ./my_lora_adapter/adapter_config.json
#   ./my_lora_adapter/adapter_model.safetensors

Saving Selected Adapters

# If the model has multiple adapters, save only specific ones
model.save_pretrained(
    "./my_adapters",
    selected_adapters=["task_a", "task_b"],
)

Loading an Adapter for Inference

from transformers import AutoModelForCausalLM
from peft import PeftModel

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")

# Load adapter from a local directory
model = PeftModel.from_pretrained(base_model, "./my_lora_adapter")

# Or load from the HuggingFace Hub
model = PeftModel.from_pretrained(base_model, "username/my-lora-adapter")

# The model is now ready for inference
model.eval()

Loading an Adapter for Continued Training

from transformers import AutoModelForCausalLM
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")

# Set is_trainable=True to keep adapter weights trainable
model = PeftModel.from_pretrained(
    base_model,
    "./my_lora_adapter",
    is_trainable=True,
)

# Continue training...

Pushing to the HuggingFace Hub

# Save locally first, then push
model.save_pretrained("./my_lora_adapter")
model.push_to_hub("username/my-lora-adapter")

# Or push directly
model.push_to_hub("username/my-lora-adapter")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment