Implementation:Huggingface Transformers Load Adapter
| Knowledge Sources | |
|---|---|
| Domains | Parameter_Efficient_Fine_Tuning, NLP, Model_Serving |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete tool for loading pre-trained adapter weights into a base model and managing adapter activation, provided by the PeftAdapterMixin class in Hugging Face Transformers.
Description
model.load_adapter() is the primary method for attaching pre-trained adapter weights to a model at inference or training time. It is complemented by set_adapter(), enable_adapters(), and disable_adapters() for managing which adapters are active.
The load_adapter method performs the following sequence:
- Hotswap resolution: If
hotswap="auto"(default), determines whether to hotswap based on whetherenable_peft_hotswap()was called and whether an adapter with the same name already exists - Config loading: Loads the
PeftConfigfrom the adapter path (either local or Hub), or uses a directly providedpeft_config - Config conversion: Calls
convert_peft_config_for_transformersto adapt the PEFT config for the model's architecture, especially for models with fused weight patterns (e.g., MoE models) - Weight mapping: Builds conversion mappings via
_build_peft_weight_mappingfor models that require weight name transformations (e.g., adding adapter name to key patterns) - MoE patching: Calls
patch_moe_parameter_targetingto handle MoE models where expert layer dimensions are transposed - Adapter injection: Unless hotswapping, calls PEFT's
inject_adapter_in_modelto create the adapter layers - Weight loading: Resolves checkpoint files (
adapter_model.safetensorsoradapter_model.bin) and loads them via_load_pretrained_modelwith the weight conversion mappings - Validation: Logs any missing or unexpected keys in the adapter state dict
The companion methods:
set_adapter(adapter_name): Iterates over allBaseTunerLayerandModulesToSaveWrappermodules in the model and callsmodule.set_adapter(adapter_name)to activate the specified adapterdisable_adapters(): Setsenabled=Falseon all adapter layers, causing them to pass through input unchangedenable_adapters(): Setsenabled=Trueon all adapter layers to restore adapter-augmented computationactive_adapters(): Returns the list of currently active adapter names
Usage
Use these methods when you need to:
- Load a saved adapter for inference or continued training
- Switch between multiple loaded adapters at runtime
- Temporarily disable adapters to get base model predictions
- Hotswap adapters in compiled models without recompilation
Code Reference
Source Location
- Repository: transformers
- File:
src/transformers/integrations/peft.pyload_adapter: lines 384-597set_adapter: lines 676-715disable_adapters: lines 717-734enable_adapters: lines 736-752active_adapters: lines 754-781
Signature
def load_adapter(
self,
peft_model_id: str | None = None,
adapter_name: str | None = None,
peft_config: dict[str, Any] | None = None,
adapter_state_dict: dict[str, "torch.Tensor"] | None = None,
low_cpu_mem_usage: bool = False,
is_trainable: bool = False,
hotswap: bool | Literal["auto"] = "auto",
local_files_only: bool = False,
adapter_kwargs: dict[str, Any] | None = None,
load_config: Optional["LoadStateDictConfig"] = None,
**kwargs,
) -> None
def set_adapter(self, adapter_name: list[str] | str) -> None
def disable_adapters(self) -> None
def enable_adapters(self) -> None
def active_adapters(self) -> list[str]
Import
# These methods are available on any PreTrainedModel via PeftAdapterMixin
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("model-name")
model.load_adapter("adapter-path")
I/O Contract
Inputs (load_adapter)
| Name | Type | Required | Description |
|---|---|---|---|
| peft_model_id | str or None |
No* | Hub model ID or local path to the adapter checkpoint. Required unless adapter_state_dict and peft_config are both provided.
|
| adapter_name | str or None |
No | Name for the adapter. Defaults to "default". Must be unique unless hotswapping.
|
| peft_config | dict or PeftConfig or None |
No | Adapter configuration. If None, loaded from peft_model_id.
|
| adapter_state_dict | dict[str, torch.Tensor] or None |
No | Pre-loaded adapter weights. If None, loaded from peft_model_id.
|
| is_trainable | bool |
No | Whether the adapter should be trainable. If False, the adapter is frozen. Default: False.
|
| hotswap | bool or "auto" |
No | Whether to replace existing adapter weights in-place. "auto" enables hotswap if enable_peft_hotswap() was called. Default: "auto".
|
| low_cpu_mem_usage | bool |
No | Reduce memory usage during loading. Default: False.
|
| local_files_only | bool |
No | Only look for adapter files locally. Default: False.
|
| adapter_kwargs | dict |
No | Additional keyword arguments for adapter config loading (e.g., token, revision).
|
Inputs (set_adapter)
| Name | Type | Required | Description |
|---|---|---|---|
| adapter_name | str or list[str] |
Yes | The name(s) of the adapter(s) to activate. Pass a list for multi-adapter inference. |
Outputs
| Name | Type | Description |
|---|---|---|
| (none) | None |
load_adapter, set_adapter, enable_adapters, and disable_adapters all modify the model in-place and return None.
|
| active_adapters | list[str] |
active_adapters() returns a list of the names of currently active adapters.
|
Usage Examples
Basic Usage: Load Adapter for Inference
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
# Load a saved adapter
model.load_adapter("my-org/llama-2-lora-adapter", adapter_name="default")
# Generate with the adapter active
inputs = tokenizer("Summarize the following:", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
Multi-Adapter Switching
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", device_map="auto")
# Load multiple adapters
model.load_adapter("./summarization-adapter", adapter_name="summarization")
model.load_adapter("./translation-adapter", adapter_name="translation")
# Switch to summarization
model.set_adapter("summarization")
summary_output = model.generate(**summary_inputs)
# Switch to translation
model.set_adapter("translation")
translation_output = model.generate(**translation_inputs)
# Compare with base model (no adapter)
model.disable_adapters()
base_output = model.generate(**inputs)
# Re-enable adapters
model.enable_adapters()
Hotswap with torch.compile
import torch
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", device_map="auto")
# Enable hotswap before loading first adapter
model.enable_peft_hotswap(target_rank=64)
# Load first adapter
model.load_adapter("./adapter-v1", adapter_name="default")
# Compile the model
model = torch.compile(model)
output_v1 = model.generate(**inputs)
# Hotswap to second adapter (no recompilation needed)
model.load_adapter("./adapter-v2", adapter_name="default")
output_v2 = model.generate(**inputs)
Load Adapter for Continued Training
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", device_map="auto")
# Load adapter with gradients enabled
model.load_adapter("./my-adapter-checkpoint", adapter_name="default", is_trainable=True)
# The adapter parameters now have requires_grad=True
# Continue training with Trainer...