Principle:Huggingface Transformers Adapter Weight Saving

Knowledge Sources	LoRA PEFT Docs Transformers Docs Safetensors
Domains	Parameter_Efficient_Fine_Tuning, NLP, Model_Serialization
Last Updated	2026-02-13 00:00 GMT

Overview

Adapter weight saving persists only the trained adapter parameters and their configuration to disk, producing lightweight checkpoint files that are a fraction of the base model's size.

Description

After training adapter parameters, they must be saved to disk for later reuse. A key advantage of PEFT methods is that the saved artifacts are extremely small: only the adapter weights and a configuration file are persisted, not the full model.

The saving process involves several steps:

State dict extraction: The model's get_adapter_state_dict() method extracts only the parameters belonging to the active adapter. This filters the full model state dict to include only keys containing adapter-specific markers (e.g., lora_A, lora_B).
Key prefix adjustment: For compatibility with the PEFT library's loading format, all adapter state dict keys are prefixed with base_model.model.. This is a convention that allows PEFT to correctly map adapter weights back onto the model regardless of how the model was wrapped.
Configuration serialization: The PeftConfig object for the active adapter is saved as adapter_config.json alongside the weights. This file records the adapter type, rank, target modules, and crucially the base_model_name_or_path.
Weight serialization: The adapter weights are saved as adapter_model.safetensors (or adapter_model.bin for legacy format). The safetensors format is preferred for security and speed.

Important constraints:

Single adapter at a time: If multiple adapters are loaded, only the active adapter is saved. To save multiple adapters, you must call set_adapter(name) followed by save_pretrained() for each.
No base weights: When adapters are detected, the base model config is not saved (since the base model is referenced by base_model_name_or_path in the adapter config), and the base model weights are excluded from the saved state dict.
Quantized models: For QLoRA models, the quantized base weights are not serializable. Only the adapter weights are saved, which is the correct behavior since the base model must be re-loaded from the original source.

Usage

Save adapter weights when you need to:

Persist trained adapters after fine-tuning for later inference or continued training
Share adapters on the Hugging Face Hub (adapter checkpoints are typically 10-100 MB vs. 10-100 GB for full models)
Create checkpoints during training for fault tolerance
Archive multiple task-specific adapters that share the same base model

Theoretical Basis

The ability to save only adapter weights is a direct consequence of the additive PEFT formulation:

W' = W + (alpha / r) * B * A

Since W (the base model weight) is unchanged and can be recovered from the original pretrained checkpoint, only A and B (the adapter matrices) need to be saved. The storage requirement for adapter weights is:

Storage = L * r * (d_in + d_out) * bytes_per_param

For a typical LoRA setup on a 7B model (rank 16, all attention layers, fp16):

Approximately 20M adapter parameters
Storage: ~40 MB in fp16

This is roughly 350x smaller than the full model checkpoint (~14 GB in fp16). This dramatic reduction enables:

Efficient sharing: Adapter checkpoints can be uploaded/downloaded in seconds
Multi-tenant serving: Multiple adapters can be stored on disk and swapped at inference time
Version control: Adapter iterations can be tracked without duplicating the base model

The saved adapter_config.json records the base_model_name_or_path, creating a dependency link to the base model. This ensures that the adapter can only be loaded onto the correct base model.

Related Pages

Implemented By

Implementation:Huggingface_Transformers_Save_Pretrained_For_Adapters

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment