Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Turboderp org Exllamav2 ExLlamaV2Lora From Directory

From Leeroopedia
Knowledge Sources
Domains Fine_Tuning, Parameter_Efficient, Deep_Learning
Last Updated 2026-02-15 00:00 GMT

Overview

Concrete tool for loading a LoRA adapter from a HuggingFace PEFT directory and attaching it to a base ExLlamaV2 model, provided by exllamav2.

Description

The ExLlamaV2Lora.from_directory class method loads a LoRA adapter stored in HuggingFace PEFT format. It reads the adapter_config.json to determine the LoRA architecture parameters (rank, alpha, target modules) and loads the A/B weight matrices from adapter_model.safetensors. The method constructs an ExLlamaV2Lora instance with properly scaled weight tensors that can be injected into the base model's linear layers during inference.

The __init__ method handles the detailed loading logic:

  • Parses the adapter configuration (rank r, lora_alpha, target modules)
  • Loads safetensor weight files and maps LoRA tensor names to the corresponding model layers
  • Applies quantization-aware adjustments if the base model uses GPTQ or similar quantization
  • Computes the effective scaling: lora_alpha / r * lora_scaling
  • Stores A and B matrices for each targeted layer

Usage

Use this when you have a PEFT-format LoRA adapter directory (containing adapter_config.json and adapter_model.safetensors) and want to load it for use with an ExLlamaV2 model. The returned object is then passed to the generator's set_loras() method to activate it during inference.

Code Reference

Source Location

  • Repository: exllamav2
  • File: exllamav2/lora.py
  • Lines: L33-40 (from_directory), L42-194 (__init__)

Signature

@classmethod
def from_directory(
    cls,
    model: ExLlamaV2,
    directory: str,
    lora_scaling: float = 1.0
) -> ExLlamaV2Lora:
    ...

Import

from exllamav2 import ExLlamaV2Lora

I/O Contract

Inputs

Name Type Required Description
model ExLlamaV2 Yes The loaded base model instance to which the LoRA adapter will be attached
directory str Yes Path to the PEFT LoRA adapter directory containing adapter_config.json and adapter_model.safetensors
lora_scaling float No Strength multiplier applied on top of lora_alpha/r; default is 1.0

Outputs

Name Type Description
lora ExLlamaV2Lora Loaded LoRA adapter instance with A/B weight matrices and effective scaling = lora_alpha / r * lora_scaling

Dependencies

  • torch - Tensor operations for weight loading and manipulation
  • json - Parsing adapter_config.json
  • safetensors - Loading adapter_model.safetensors weight files
  • math - Scaling factor computation

Usage Examples

Basic

from exllamav2 import ExLlamaV2, ExLlamaV2Config, ExLlamaV2Lora

# Load base model
config = ExLlamaV2Config(model_dir)
model = ExLlamaV2(config)
model.load()

# Load LoRA adapter from PEFT directory
lora = ExLlamaV2Lora.from_directory(
    model,
    "/path/to/lora_adapter/"
)

With Custom Scaling

# Load with reduced adapter influence
lora = ExLlamaV2Lora.from_directory(
    model,
    "/path/to/lora_adapter/",
    lora_scaling=0.5  # Half strength
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment