Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Transformers Load Adapter

From Leeroopedia
Knowledge Sources
Domains Parameter_Efficient_Fine_Tuning, NLP, Model_Serving
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete tool for loading pre-trained adapter weights into a base model and managing adapter activation, provided by the PeftAdapterMixin class in Hugging Face Transformers.

Description

model.load_adapter() is the primary method for attaching pre-trained adapter weights to a model at inference or training time. It is complemented by set_adapter(), enable_adapters(), and disable_adapters() for managing which adapters are active.

The load_adapter method performs the following sequence:

  1. Hotswap resolution: If hotswap="auto" (default), determines whether to hotswap based on whether enable_peft_hotswap() was called and whether an adapter with the same name already exists
  2. Config loading: Loads the PeftConfig from the adapter path (either local or Hub), or uses a directly provided peft_config
  3. Config conversion: Calls convert_peft_config_for_transformers to adapt the PEFT config for the model's architecture, especially for models with fused weight patterns (e.g., MoE models)
  4. Weight mapping: Builds conversion mappings via _build_peft_weight_mapping for models that require weight name transformations (e.g., adding adapter name to key patterns)
  5. MoE patching: Calls patch_moe_parameter_targeting to handle MoE models where expert layer dimensions are transposed
  6. Adapter injection: Unless hotswapping, calls PEFT's inject_adapter_in_model to create the adapter layers
  7. Weight loading: Resolves checkpoint files (adapter_model.safetensors or adapter_model.bin) and loads them via _load_pretrained_model with the weight conversion mappings
  8. Validation: Logs any missing or unexpected keys in the adapter state dict

The companion methods:

  • set_adapter(adapter_name): Iterates over all BaseTunerLayer and ModulesToSaveWrapper modules in the model and calls module.set_adapter(adapter_name) to activate the specified adapter
  • disable_adapters(): Sets enabled=False on all adapter layers, causing them to pass through input unchanged
  • enable_adapters(): Sets enabled=True on all adapter layers to restore adapter-augmented computation
  • active_adapters(): Returns the list of currently active adapter names

Usage

Use these methods when you need to:

  • Load a saved adapter for inference or continued training
  • Switch between multiple loaded adapters at runtime
  • Temporarily disable adapters to get base model predictions
  • Hotswap adapters in compiled models without recompilation

Code Reference

Source Location

  • Repository: transformers
  • File: src/transformers/integrations/peft.py
    • load_adapter: lines 384-597
    • set_adapter: lines 676-715
    • disable_adapters: lines 717-734
    • enable_adapters: lines 736-752
    • active_adapters: lines 754-781

Signature

def load_adapter(
    self,
    peft_model_id: str | None = None,
    adapter_name: str | None = None,
    peft_config: dict[str, Any] | None = None,
    adapter_state_dict: dict[str, "torch.Tensor"] | None = None,
    low_cpu_mem_usage: bool = False,
    is_trainable: bool = False,
    hotswap: bool | Literal["auto"] = "auto",
    local_files_only: bool = False,
    adapter_kwargs: dict[str, Any] | None = None,
    load_config: Optional["LoadStateDictConfig"] = None,
    **kwargs,
) -> None

def set_adapter(self, adapter_name: list[str] | str) -> None

def disable_adapters(self) -> None

def enable_adapters(self) -> None

def active_adapters(self) -> list[str]

Import

# These methods are available on any PreTrainedModel via PeftAdapterMixin
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("model-name")
model.load_adapter("adapter-path")

I/O Contract

Inputs (load_adapter)

Name Type Required Description
peft_model_id str or None No* Hub model ID or local path to the adapter checkpoint. Required unless adapter_state_dict and peft_config are both provided.
adapter_name str or None No Name for the adapter. Defaults to "default". Must be unique unless hotswapping.
peft_config dict or PeftConfig or None No Adapter configuration. If None, loaded from peft_model_id.
adapter_state_dict dict[str, torch.Tensor] or None No Pre-loaded adapter weights. If None, loaded from peft_model_id.
is_trainable bool No Whether the adapter should be trainable. If False, the adapter is frozen. Default: False.
hotswap bool or "auto" No Whether to replace existing adapter weights in-place. "auto" enables hotswap if enable_peft_hotswap() was called. Default: "auto".
low_cpu_mem_usage bool No Reduce memory usage during loading. Default: False.
local_files_only bool No Only look for adapter files locally. Default: False.
adapter_kwargs dict No Additional keyword arguments for adapter config loading (e.g., token, revision).

Inputs (set_adapter)

Name Type Required Description
adapter_name str or list[str] Yes The name(s) of the adapter(s) to activate. Pass a list for multi-adapter inference.

Outputs

Name Type Description
(none) None load_adapter, set_adapter, enable_adapters, and disable_adapters all modify the model in-place and return None.
active_adapters list[str] active_adapters() returns a list of the names of currently active adapters.

Usage Examples

Basic Usage: Load Adapter for Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")

# Load a saved adapter
model.load_adapter("my-org/llama-2-lora-adapter", adapter_name="default")

# Generate with the adapter active
inputs = tokenizer("Summarize the following:", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)

Multi-Adapter Switching

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", device_map="auto")

# Load multiple adapters
model.load_adapter("./summarization-adapter", adapter_name="summarization")
model.load_adapter("./translation-adapter", adapter_name="translation")

# Switch to summarization
model.set_adapter("summarization")
summary_output = model.generate(**summary_inputs)

# Switch to translation
model.set_adapter("translation")
translation_output = model.generate(**translation_inputs)

# Compare with base model (no adapter)
model.disable_adapters()
base_output = model.generate(**inputs)

# Re-enable adapters
model.enable_adapters()

Hotswap with torch.compile

import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", device_map="auto")

# Enable hotswap before loading first adapter
model.enable_peft_hotswap(target_rank=64)

# Load first adapter
model.load_adapter("./adapter-v1", adapter_name="default")

# Compile the model
model = torch.compile(model)
output_v1 = model.generate(**inputs)

# Hotswap to second adapter (no recompilation needed)
model.load_adapter("./adapter-v2", adapter_name="default")
output_v2 = model.generate(**inputs)

Load Adapter for Continued Training

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", device_map="auto")

# Load adapter with gradients enabled
model.load_adapter("./my-adapter-checkpoint", adapter_name="default", is_trainable=True)

# The adapter parameters now have requires_grad=True
# Continue training with Trainer...

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment