Implementation:Turboderp org Exllamav2 ExLlamaV2Lora From Directory

Knowledge Sources	ExLlamaV2
Domains	Fine_Tuning, Parameter_Efficient, Deep_Learning
Last Updated	2026-02-15 00:00 GMT

Overview

Concrete tool for loading a LoRA adapter from a HuggingFace PEFT directory and attaching it to a base ExLlamaV2 model, provided by exllamav2.

Description

The ExLlamaV2Lora.from_directory class method loads a LoRA adapter stored in HuggingFace PEFT format. It reads the adapter_config.json to determine the LoRA architecture parameters (rank, alpha, target modules) and loads the A/B weight matrices from adapter_model.safetensors. The method constructs an ExLlamaV2Lora instance with properly scaled weight tensors that can be injected into the base model's linear layers during inference.

The __init__ method handles the detailed loading logic:

Parses the adapter configuration (rank r, lora_alpha, target modules)
Loads safetensor weight files and maps LoRA tensor names to the corresponding model layers
Applies quantization-aware adjustments if the base model uses GPTQ or similar quantization
Computes the effective scaling: lora_alpha / r * lora_scaling
Stores A and B matrices for each targeted layer

Usage

Use this when you have a PEFT-format LoRA adapter directory (containing adapter_config.json and adapter_model.safetensors) and want to load it for use with an ExLlamaV2 model. The returned object is then passed to the generator's set_loras() method to activate it during inference.

Code Reference

Source Location

Repository: exllamav2
File: exllamav2/lora.py
Lines: L33-40 (from_directory), L42-194 (__init__)

Signature

@classmethod
def from_directory(
    cls,
    model: ExLlamaV2,
    directory: str,
    lora_scaling: float = 1.0
) -> ExLlamaV2Lora:
    ...

Import

from exllamav2 import ExLlamaV2Lora

I/O Contract

Inputs

Name	Type	Required	Description
model	ExLlamaV2	Yes	The loaded base model instance to which the LoRA adapter will be attached
directory	str	Yes	Path to the PEFT LoRA adapter directory containing adapter_config.json and adapter_model.safetensors
lora_scaling	float	No	Strength multiplier applied on top of lora_alpha/r; default is 1.0

Outputs

Name	Type	Description
lora	ExLlamaV2Lora	Loaded LoRA adapter instance with A/B weight matrices and effective scaling = lora_alpha / r * lora_scaling

Dependencies

torch - Tensor operations for weight loading and manipulation
json - Parsing adapter_config.json
safetensors - Loading adapter_model.safetensors weight files
math - Scaling factor computation

Usage Examples

Basic

from exllamav2 import ExLlamaV2, ExLlamaV2Config, ExLlamaV2Lora

# Load base model
config = ExLlamaV2Config(model_dir)
model = ExLlamaV2(config)
model.load()

# Load LoRA adapter from PEFT directory
lora = ExLlamaV2Lora.from_directory(
    model,
    "/path/to/lora_adapter/"
)

With Custom Scaling

# Load with reduced adapter influence
lora = ExLlamaV2Lora.from_directory(
    model,
    "/path/to/lora_adapter/",
    lora_scaling=0.5  # Half strength
)

Related Pages

Implements Principle

Principle:Turboderp_org_Exllamav2_LoRA_Adapter_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment