Principle:Huggingface Transformers Adapter Weight Saving
| Knowledge Sources | |
|---|---|
| Domains | Parameter_Efficient_Fine_Tuning, NLP, Model_Serialization |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Adapter weight saving persists only the trained adapter parameters and their configuration to disk, producing lightweight checkpoint files that are a fraction of the base model's size.
Description
After training adapter parameters, they must be saved to disk for later reuse. A key advantage of PEFT methods is that the saved artifacts are extremely small: only the adapter weights and a configuration file are persisted, not the full model.
The saving process involves several steps:
- State dict extraction: The model's
get_adapter_state_dict()method extracts only the parameters belonging to the active adapter. This filters the full model state dict to include only keys containing adapter-specific markers (e.g.,lora_A,lora_B). - Key prefix adjustment: For compatibility with the PEFT library's loading format, all adapter state dict keys are prefixed with
base_model.model.. This is a convention that allows PEFT to correctly map adapter weights back onto the model regardless of how the model was wrapped. - Configuration serialization: The
PeftConfigobject for the active adapter is saved asadapter_config.jsonalongside the weights. This file records the adapter type, rank, target modules, and crucially thebase_model_name_or_path. - Weight serialization: The adapter weights are saved as
adapter_model.safetensors(oradapter_model.binfor legacy format). The safetensors format is preferred for security and speed.
Important constraints:
- Single adapter at a time: If multiple adapters are loaded, only the active adapter is saved. To save multiple adapters, you must call
set_adapter(name)followed bysave_pretrained()for each. - No base weights: When adapters are detected, the base model config is not saved (since the base model is referenced by
base_model_name_or_pathin the adapter config), and the base model weights are excluded from the saved state dict. - Quantized models: For QLoRA models, the quantized base weights are not serializable. Only the adapter weights are saved, which is the correct behavior since the base model must be re-loaded from the original source.
Usage
Save adapter weights when you need to:
- Persist trained adapters after fine-tuning for later inference or continued training
- Share adapters on the Hugging Face Hub (adapter checkpoints are typically 10-100 MB vs. 10-100 GB for full models)
- Create checkpoints during training for fault tolerance
- Archive multiple task-specific adapters that share the same base model
Theoretical Basis
The ability to save only adapter weights is a direct consequence of the additive PEFT formulation:
W' = W + (alpha / r) * B * A
Since W (the base model weight) is unchanged and can be recovered from the original pretrained checkpoint, only A and B (the adapter matrices) need to be saved. The storage requirement for adapter weights is:
Storage = L * r * (d_in + d_out) * bytes_per_param
For a typical LoRA setup on a 7B model (rank 16, all attention layers, fp16):
- Approximately 20M adapter parameters
- Storage: ~40 MB in fp16
This is roughly 350x smaller than the full model checkpoint (~14 GB in fp16). This dramatic reduction enables:
- Efficient sharing: Adapter checkpoints can be uploaded/downloaded in seconds
- Multi-tenant serving: Multiple adapters can be stored on disk and swapped at inference time
- Version control: Adapter iterations can be tracked without duplicating the base model
The saved adapter_config.json records the base_model_name_or_path, creating a dependency link to the base model. This ensures that the adapter can only be loaded onto the correct base model.