Implementation:Turboderp org Exllamav2 ExLlamaV2DynamicGenerator Set Loras

Knowledge Sources	ExLlamaV2
Domains	Fine_Tuning, Inference_Configuration, Deep_Learning
Last Updated	2026-02-15 00:00 GMT

Overview

Concrete tool for activating or deactivating LoRA adapters in the ExLlamaV2 dynamic generator's inference pipeline, provided by exllamav2.

Description

The set_loras() method on ExLlamaV2DynamicGenerator registers one or more loaded LoRA adapters with the generator so their weight modifications are applied during the forward pass. When called, it updates the internal adapter references used by the model's linear layers. The method enforces that the job queue must be empty at the time of the call to prevent inconsistent generation states.

Passing None or an empty list deactivates all LoRA adapters, reverting the generator to base model behavior.

Usage

Use this method after loading LoRA adapters with ExLlamaV2Lora.from_directory() and before enqueuing any generation jobs that should use the adapter. Also use it to switch adapters between batches of work or to disable adapters entirely.

Code Reference

Source Location

Repository: exllamav2
File: exllamav2/generator/dynamic.py
Lines: L522-538

Signature

def set_loras(
    self,
    loras: list[ExLlamaV2Lora] | None
) -> None:
    ...

Import

from exllamav2.generator import ExLlamaV2DynamicGenerator
# set_loras is a method on ExLlamaV2DynamicGenerator instances

I/O Contract

Inputs

Name	Type	Required	Description
loras	list[ExLlamaV2Lora] or None	Yes	List of loaded LoRA adapter instances to activate, or None to disable all adapters. Must be called when the job queue is empty.

Outputs

Name	Type	Description
(none)	None	The generator is mutated in place; LoRA adapter weights are injected into the model's linear layer forward passes.

Usage Examples

Basic

from exllamav2 import ExLlamaV2, ExLlamaV2Config, ExLlamaV2Lora
from exllamav2.generator import ExLlamaV2DynamicGenerator

# Assume model, tokenizer, cache are already initialized
generator = ExLlamaV2DynamicGenerator(
    model=model,
    cache=cache,
    tokenizer=tokenizer
)

# Load and activate a LoRA adapter
lora = ExLlamaV2Lora.from_directory(model, "/path/to/lora_adapter/")
generator.set_loras([lora])

# Generate with LoRA active
output = generator.generate(prompt="Hello", max_new_tokens=100)

Disabling Adapters

# Disable all LoRA adapters (revert to base model)
generator.set_loras(None)

Multiple Adapters

# Load multiple adapters
lora_style = ExLlamaV2Lora.from_directory(model, "/path/to/style_lora/")
lora_domain = ExLlamaV2Lora.from_directory(model, "/path/to/domain_lora/")

# Activate both simultaneously (additive effect)
generator.set_loras([lora_style, lora_domain])

Related Pages

Implements Principle

Principle:Turboderp_org_Exllamav2_LoRA_Generator_Configuration

Uses Heuristic

Heuristic:Turboderp_org_Exllamav2_Dynamic_Generator_Tuning

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment