Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Huggingface Peft PeftModel Save Load

From Leeroopedia


Template:Metadata

Overview

Concrete tool for saving and loading PEFT adapter weights provided by the PEFT library. The PeftModel.save_pretrained and PeftModel.from_pretrained methods implement the full lifecycle of adapter persistence -- serializing trained adapter weights and configuration to disk (or Hub), and reconstructing a PEFT-wrapped model from saved artifacts.

Description

These two methods form a complementary pair:

  • save_pretrained -- Extracts only the trainable adapter parameters from the model, serializes them to safetensors (or PyTorch .bin) format, and saves the adapter configuration as JSON. Supports saving selected adapters, handling tensor aliasing for safetensors compatibility, and converting specialized initialization methods (PiSSA, CorDA, OLoRA) to standard LoRA format.
  • from_pretrained -- Class method that loads a saved adapter configuration, creates the appropriate PeftModel subclass around a base model, and injects the saved adapter weights. Supports loading from local directories or HuggingFace Hub model IDs, with options for trainable vs. inference mode, device mapping, and disk offloading.

Both methods handle the complete adapter lifecycle including model card generation, multi-adapter management, and integration with the HuggingFace ecosystem.

Usage

  • save_pretrained -- Call after training is complete to persist adapter weights. Can also be called during training for intermediate checkpoints.
  • from_pretrained -- Call to load a previously saved adapter onto a base model for inference or continued training.

Code Reference

Source file: src/peft/peft_model.py, lines 190--605

Import:

from peft import PeftModel

save_pretrained signature:

def save_pretrained(
    self,
    save_directory: str,
    safe_serialization: bool = True,
    selected_adapters: Optional[list[str]] = None,
    save_embedding_layers: Union[str, bool] = "auto",
    is_main_process: bool = True,
    path_initial_model_for_weight_conversion: Optional[str] = None,
    **kwargs: Any,
) -> None:

from_pretrained signature:

@classmethod
def from_pretrained(
    cls,
    model: torch.nn.Module,
    model_id: Union[str, os.PathLike],
    adapter_name: str = "default",
    is_trainable: bool = False,
    config: Optional[PeftConfig] = None,
    autocast_adapter_dtype: bool = True,
    ephemeral_gpu_offload: bool = False,
    low_cpu_mem_usage: bool = False,
    **kwargs: Any,
) -> PeftModel:

I/O Contract

save_pretrained Inputs

Parameter Type Default Description
save_directory str (required) Directory path where adapter weights and config will be saved. Created if it does not exist.
safe_serialization bool True Whether to save in safetensors format (recommended) or PyTorch .bin format.
selected_adapters Optional[list[str]] None List of adapter names to save. If None, saves all adapters.
save_embedding_layers Union[str, bool] "auto" Whether to include embedding layers in the saved weights. "auto" detects if embeddings are in target modules.
is_main_process bool True Controls whether this process performs the actual save. Set to False on non-main processes in distributed training.
path_initial_model_for_weight_conversion Optional[str] None Path to the initial adapter checkpoint for PiSSA/CorDA/OLoRA-to-LoRA conversion. When set, computes the delta between initial and trained weights.
**kwargs Any -- Additional keyword arguments passed to push_to_hub.

save_pretrained Output

Type Description
None Writes files to save_directory: adapter_model.safetensors (or adapter_model.bin) and adapter_config.json. For non-default adapters, files are saved in a subdirectory named after the adapter.

from_pretrained Inputs

Parameter Type Default Description
model torch.nn.Module (required) The base model to wrap with PEFT adapters. Typically a HuggingFace PreTrainedModel. May be modified in-place.
model_id Union[str, os.PathLike] (required) Path to a local directory containing saved adapter files, or a HuggingFace Hub model ID.
adapter_name str "default" Name to assign to the loaded adapter.
is_trainable bool False If True, the adapter is loaded in training mode. If False, loaded in inference mode.
config Optional[PeftConfig] None Pre-loaded configuration object. If None, config is loaded from model_id.
autocast_adapter_dtype bool True Whether to cast adapter weights from float16/bfloat16 to float32 for stable training.
ephemeral_gpu_offload bool False Enables on-demand GPU offloading for partially loaded modules to speed up operations while minimizing VRAM usage.
low_cpu_mem_usage bool False Creates empty adapter weights on meta device before loading. Speeds up the loading process.
**kwargs Any -- Additional keyword arguments including subfolder, revision, cache_dir, token, and torch_device.

from_pretrained Output

Type Description
PeftModel The base model wrapped with PEFT adapter layers and loaded weights. The specific subclass (e.g., PeftModelForCausalLM) depends on the task_type in the loaded configuration.

Usage Examples

Saving Adapter Weights After Training

from transformers import AutoModelForCausalLM, Trainer, TrainingArguments
from peft import get_peft_model, LoraConfig

# Setup model with adapter
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
)
model = get_peft_model(model, lora_config)

# ... train the model ...

# Save only the adapter weights (a few MB instead of 14+ GB)
model.save_pretrained("./my_lora_adapter")

# This creates:
#   ./my_lora_adapter/adapter_config.json
#   ./my_lora_adapter/adapter_model.safetensors

Saving Selected Adapters

# If the model has multiple adapters, save only specific ones
model.save_pretrained(
    "./my_adapters",
    selected_adapters=["task_a", "task_b"],
)

Loading an Adapter for Inference

from transformers import AutoModelForCausalLM
from peft import PeftModel

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")

# Load adapter from a local directory
model = PeftModel.from_pretrained(base_model, "./my_lora_adapter")

# Or load from the HuggingFace Hub
model = PeftModel.from_pretrained(base_model, "username/my-lora-adapter")

# The model is now ready for inference
model.eval()

Loading an Adapter for Continued Training

from transformers import AutoModelForCausalLM
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")

# Set is_trainable=True to keep adapter weights trainable
model = PeftModel.from_pretrained(
    base_model,
    "./my_lora_adapter",
    is_trainable=True,
)

# Continue training...

Pushing to the HuggingFace Hub

# Save locally first, then push
model.save_pretrained("./my_lora_adapter")
model.push_to_hub("username/my-lora-adapter")

# Or push directly
model.push_to_hub("username/my-lora-adapter")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment