Implementation:Huggingface Transformers Load Adapter

Knowledge Sources	Transformers PEFT Docs PEFT Hotswap Docs
Domains	Parameter_Efficient_Fine_Tuning, NLP, Model_Serving
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete tool for loading pre-trained adapter weights into a base model and managing adapter activation, provided by the PeftAdapterMixin class in Hugging Face Transformers.

Description

model.load_adapter() is the primary method for attaching pre-trained adapter weights to a model at inference or training time. It is complemented by set_adapter(), enable_adapters(), and disable_adapters() for managing which adapters are active.

The load_adapter method performs the following sequence:

Hotswap resolution: If hotswap="auto" (default), determines whether to hotswap based on whether enable_peft_hotswap() was called and whether an adapter with the same name already exists
Config loading: Loads the PeftConfig from the adapter path (either local or Hub), or uses a directly provided peft_config
Config conversion: Calls convert_peft_config_for_transformers to adapt the PEFT config for the model's architecture, especially for models with fused weight patterns (e.g., MoE models)
Weight mapping: Builds conversion mappings via _build_peft_weight_mapping for models that require weight name transformations (e.g., adding adapter name to key patterns)
MoE patching: Calls patch_moe_parameter_targeting to handle MoE models where expert layer dimensions are transposed
Adapter injection: Unless hotswapping, calls PEFT's inject_adapter_in_model to create the adapter layers
Weight loading: Resolves checkpoint files (adapter_model.safetensors or adapter_model.bin) and loads them via _load_pretrained_model with the weight conversion mappings
Validation: Logs any missing or unexpected keys in the adapter state dict

The companion methods:

set_adapter(adapter_name): Iterates over all BaseTunerLayer and ModulesToSaveWrapper modules in the model and calls module.set_adapter(adapter_name) to activate the specified adapter
disable_adapters(): Sets enabled=False on all adapter layers, causing them to pass through input unchanged
enable_adapters(): Sets enabled=True on all adapter layers to restore adapter-augmented computation
active_adapters(): Returns the list of currently active adapter names

Usage

Use these methods when you need to:

Load a saved adapter for inference or continued training
Switch between multiple loaded adapters at runtime
Temporarily disable adapters to get base model predictions
Hotswap adapters in compiled models without recompilation

Code Reference

Source Location

Repository: transformers
File: src/transformers/integrations/peft.py
- load_adapter: lines 384-597
- set_adapter: lines 676-715
- disable_adapters: lines 717-734
- enable_adapters: lines 736-752
- active_adapters: lines 754-781

Signature

def load_adapter(
    self,
    peft_model_id: str | None = None,
    adapter_name: str | None = None,
    peft_config: dict[str, Any] | None = None,
    adapter_state_dict: dict[str, "torch.Tensor"] | None = None,
    low_cpu_mem_usage: bool = False,
    is_trainable: bool = False,
    hotswap: bool | Literal["auto"] = "auto",
    local_files_only: bool = False,
    adapter_kwargs: dict[str, Any] | None = None,
    load_config: Optional["LoadStateDictConfig"] = None,
    **kwargs,
) -> None

def set_adapter(self, adapter_name: list[str] | str) -> None

def disable_adapters(self) -> None

def enable_adapters(self) -> None

def active_adapters(self) -> list[str]

Import

# These methods are available on any PreTrainedModel via PeftAdapterMixin
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("model-name")
model.load_adapter("adapter-path")

I/O Contract

Inputs (load_adapter)

Name	Type	Required	Description
peft_model_id	`str` or `None`	No*	Hub model ID or local path to the adapter checkpoint. Required unless `adapter_state_dict` and `peft_config` are both provided.
adapter_name	`str` or `None`	No	Name for the adapter. Defaults to `"default"`. Must be unique unless hotswapping.
peft_config	`dict` or `PeftConfig` or `None`	No	Adapter configuration. If `None`, loaded from `peft_model_id`.
adapter_state_dict	`dict[str, torch.Tensor]` or `None`	No	Pre-loaded adapter weights. If `None`, loaded from `peft_model_id`.
is_trainable	`bool`	No	Whether the adapter should be trainable. If `False`, the adapter is frozen. Default: `False`.
hotswap	`bool` or `"auto"`	No	Whether to replace existing adapter weights in-place. `"auto"` enables hotswap if `enable_peft_hotswap()` was called. Default: `"auto"`.
low_cpu_mem_usage	`bool`	No	Reduce memory usage during loading. Default: `False`.
local_files_only	`bool`	No	Only look for adapter files locally. Default: `False`.
adapter_kwargs	`dict`	No	Additional keyword arguments for adapter config loading (e.g., `token`, `revision`).

Inputs (set_adapter)

Name	Type	Required	Description
adapter_name	`str` or `list[str]`	Yes	The name(s) of the adapter(s) to activate. Pass a list for multi-adapter inference.

Outputs

Name	Type	Description
(none)	`None`	`load_adapter`, `set_adapter`, `enable_adapters`, and `disable_adapters` all modify the model in-place and return `None`.
active_adapters	`list[str]`	`active_adapters()` returns a list of the names of currently active adapters.

Usage Examples

Basic Usage: Load Adapter for Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")

# Load a saved adapter
model.load_adapter("my-org/llama-2-lora-adapter", adapter_name="default")

# Generate with the adapter active
inputs = tokenizer("Summarize the following:", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)

Multi-Adapter Switching

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", device_map="auto")

# Load multiple adapters
model.load_adapter("./summarization-adapter", adapter_name="summarization")
model.load_adapter("./translation-adapter", adapter_name="translation")

# Switch to summarization
model.set_adapter("summarization")
summary_output = model.generate(**summary_inputs)

# Switch to translation
model.set_adapter("translation")
translation_output = model.generate(**translation_inputs)

# Compare with base model (no adapter)
model.disable_adapters()
base_output = model.generate(**inputs)

# Re-enable adapters
model.enable_adapters()

Hotswap with torch.compile

import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", device_map="auto")

# Enable hotswap before loading first adapter
model.enable_peft_hotswap(target_rank=64)

# Load first adapter
model.load_adapter("./adapter-v1", adapter_name="default")

# Compile the model
model = torch.compile(model)
output_v1 = model.generate(**inputs)

# Hotswap to second adapter (no recompilation needed)
model.load_adapter("./adapter-v2", adapter_name="default")
output_v2 = model.generate(**inputs)

Load Adapter for Continued Training

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", device_map="auto")

# Load adapter with gradients enabled
model.load_adapter("./my-adapter-checkpoint", adapter_name="default", is_trainable=True)

# The adapter parameters now have requires_grad=True
# Continue training with Trainer...

Related Pages

Implements Principle

Principle:Huggingface_Transformers_Adapter_Loading_And_Switching

Requires Environment

Environment:Huggingface_Transformers_PEFT_Adapter_Env

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment