Implementation:Turboderp org Exllamav2 ExLlamaV2Lora From Directory
| Knowledge Sources | |
|---|---|
| Domains | Fine_Tuning, Parameter_Efficient, Deep_Learning |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Concrete tool for loading a LoRA adapter from a HuggingFace PEFT directory and attaching it to a base ExLlamaV2 model, provided by exllamav2.
Description
The ExLlamaV2Lora.from_directory class method loads a LoRA adapter stored in HuggingFace PEFT format. It reads the adapter_config.json to determine the LoRA architecture parameters (rank, alpha, target modules) and loads the A/B weight matrices from adapter_model.safetensors. The method constructs an ExLlamaV2Lora instance with properly scaled weight tensors that can be injected into the base model's linear layers during inference.
The __init__ method handles the detailed loading logic:
- Parses the adapter configuration (rank r, lora_alpha, target modules)
- Loads safetensor weight files and maps LoRA tensor names to the corresponding model layers
- Applies quantization-aware adjustments if the base model uses GPTQ or similar quantization
- Computes the effective scaling: lora_alpha / r * lora_scaling
- Stores A and B matrices for each targeted layer
Usage
Use this when you have a PEFT-format LoRA adapter directory (containing adapter_config.json and adapter_model.safetensors) and want to load it for use with an ExLlamaV2 model. The returned object is then passed to the generator's set_loras() method to activate it during inference.
Code Reference
Source Location
- Repository: exllamav2
- File: exllamav2/lora.py
- Lines: L33-40 (from_directory), L42-194 (__init__)
Signature
@classmethod
def from_directory(
cls,
model: ExLlamaV2,
directory: str,
lora_scaling: float = 1.0
) -> ExLlamaV2Lora:
...
Import
from exllamav2 import ExLlamaV2Lora
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | ExLlamaV2 | Yes | The loaded base model instance to which the LoRA adapter will be attached |
| directory | str | Yes | Path to the PEFT LoRA adapter directory containing adapter_config.json and adapter_model.safetensors |
| lora_scaling | float | No | Strength multiplier applied on top of lora_alpha/r; default is 1.0 |
Outputs
| Name | Type | Description |
|---|---|---|
| lora | ExLlamaV2Lora | Loaded LoRA adapter instance with A/B weight matrices and effective scaling = lora_alpha / r * lora_scaling |
Dependencies
- torch - Tensor operations for weight loading and manipulation
- json - Parsing adapter_config.json
- safetensors - Loading adapter_model.safetensors weight files
- math - Scaling factor computation
Usage Examples
Basic
from exllamav2 import ExLlamaV2, ExLlamaV2Config, ExLlamaV2Lora
# Load base model
config = ExLlamaV2Config(model_dir)
model = ExLlamaV2(config)
model.load()
# Load LoRA adapter from PEFT directory
lora = ExLlamaV2Lora.from_directory(
model,
"/path/to/lora_adapter/"
)
With Custom Scaling
# Load with reduced adapter influence
lora = ExLlamaV2Lora.from_directory(
model,
"/path/to/lora_adapter/",
lora_scaling=0.5 # Half strength
)