Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Transformers Add Adapter For QLoRA

From Leeroopedia
Revision as of 13:05, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Huggingface_Transformers_Add_Adapter_For_QLoRA.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Model_Optimization, Quantization, Fine_Tuning, Parameter_Efficient_Fine_Tuning
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete wrapper for injecting LoRA adapters into a (quantized) pretrained model for QLoRA fine-tuning, provided by Hugging Face Transformers via its PEFT integration.

Description

The add_adapter() method is defined in the PeftAdapterMixin class (integrations/peft.py, line 636) and is inherited by all PreTrainedModel subclasses. It serves as a thin wrapper around PEFT's inject_adapter_in_model() function, providing Transformers-native integration for adding LoRA (and other PEFT) adapters.

The method performs the following steps:

  1. Validates that the PEFT library is installed and meets the minimum version requirement.
  2. Assigns a default adapter name ("default") if none is provided.
  3. Checks that the adapter name is not already registered.
  4. Validates that the config is an instance of peft.PeftConfig.
  5. Sets the base model path on the adapter config for serialization.
  6. Calls peft.inject_adapter_in_model() to create and inject the adapter layers.
  7. Activates the new adapter via self.set_adapter(adapter_name).

This method is a wrapper rather than a standalone API because the actual adapter injection logic resides in the PEFT library. The Transformers integration handles lifecycle management (registration, activation, naming) and ensures compatibility with the Transformers model ecosystem.

Usage

Use this API after loading a 4-bit quantized model to add LoRA adapters for QLoRA fine-tuning. It can also be used on non-quantized models for standard LoRA fine-tuning.

Code Reference

Source Location

  • Repository: transformers
  • File: src/transformers/integrations/peft.py (lines 636-675)

Signature

class PeftAdapterMixin:
    def add_adapter(
        self,
        adapter_config: PeftConfig,
        adapter_name: str | None = None,
    ) -> None: ...

Import

# add_adapter is a method on model instances
from transformers import AutoModelForCausalLM
from peft import LoraConfig

model = AutoModelForCausalLM.from_pretrained(...)
model.add_adapter(LoraConfig(...))

I/O Contract

Inputs

Name Type Required Description
adapter_config peft.PeftConfig Yes Configuration for the adapter to add. For QLoRA, this is a LoraConfig instance specifying rank, alpha, target modules, and dropout.
adapter_name str or None No (default: "default") Name for the adapter. Must be unique across all adapters on the model.

Outputs

Name Type Description
(return) None The method modifies the model in-place, injecting adapter layers and activating the new adapter.

Usage Examples

Standard QLoRA Setup

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig

# Step 1: Load model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=bnb_config,
    device_map="auto",
    torch_dtype=torch.float16,
)

# Step 2: Define LoRA configuration
lora_config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

# Step 3: Add the adapter
model.add_adapter(lora_config)

# The model now has trainable LoRA parameters on top of frozen 4-bit weights
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())
print(f"Trainable: {trainable_params:,} / {total_params:,} "
      f"({100 * trainable_params / total_params:.2f}%)")

Targeting All Linear Layers

from peft import LoraConfig

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules="all-linear",
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model.add_adapter(lora_config)

Multiple Named Adapters

from peft import LoraConfig

# Add a task-specific adapter
task_a_config = LoraConfig(r=8, lora_alpha=16, target_modules=["q_proj", "v_proj"])
model.add_adapter(task_a_config, adapter_name="task_a")

# Add another adapter for a different task
task_b_config = LoraConfig(r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"])
model.add_adapter(task_b_config, adapter_name="task_b")

# Switch between adapters
model.set_adapter("task_a")
# ... inference or training for task A ...

model.set_adapter("task_b")
# ... inference or training for task B ...

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment