Implementation:Microsoft LoRA LoRA Weight Merging
| Knowledge Sources | |
|---|---|
| Domains | Inference, Parameter_Efficient_Fine_Tuning |
| Pattern Doc | Yes |
| Last Updated | 2026-02-10 05:00 GMT |
Overview
Pattern documentation for merging and unmerging LoRA weights into base model weights for zero-overhead inference.
Description
This is a pattern doc describing how LoRA weight merging works and the interface users must follow to deploy LoRA models for inference. The merge/unmerge logic is implemented within each LoRA layer class's train() method override. When merge_weights=True (the default), calling model.eval() automatically merges LoRA weights into base weights, and calling model.train() automatically unmerges them.
Usage
Follow this pattern after training is complete and you want to deploy the model for inference, or during training when you need to evaluate the model with merged weights for accurate metrics.
Code Reference
Source Locations
The merge/unmerge logic is implemented in the train() method of each LoRA layer class:
- Repository: microsoft/LoRA
- File: loralib/layers.py
- Linear.train(): Lines 62-76
- Embedding.train(): Lines 127-142 (Note: Embedding uses A @ B since it's the transpose arrangement)
- MergedLinear.train(): Lines 218-233
- ConvLoRA.train(): Lines 275-288
Merge Logic (Linear Layer Example)
def train(self, mode: bool = True):
"""Override train() to handle weight merge/unmerge.
On eval (mode=False): merge LoRA into base weight
On train (mode=True): unmerge LoRA from base weight
"""
nn.Linear.train(self, mode)
if mode:
# Switching to train mode: unmerge if currently merged
if self.merge_weights and self.merged:
if self.r > 0:
self.weight.data -= (self.lora_B @ self.lora_A) * self.scaling
self.merged = False
else:
# Switching to eval mode: merge if not already merged
if self.merge_weights and not self.merged:
if self.r > 0:
self.weight.data += (self.lora_B @ self.lora_A) * self.scaling
self.merged = True
Pattern Interface
Prerequisites
- Model must have LoRA layers with merge_weights=True (the default)
- LoRA parameters must be trained (or loaded from a checkpoint)
- For checkpoint loading: use load_state_dict(lora_dict, strict=False)
Core Pattern: Inference Deployment
import torch
import loralib as lora
# 1. Load base model with LoRA layer structure
model = create_model_with_lora_layers(r=8, lora_alpha=16)
# 2. Load trained LoRA weights
lora_dict = torch.load('lora_checkpoint.pt')
model.load_state_dict(lora_dict, strict=False)
# 3. Merge LoRA weights into base weights (triggered by eval())
model.eval()
# At this point, model.weight contains W + BA * (alpha/r)
# No extra computation paths exist; inference is zero-overhead
# 4. Run inference
with torch.no_grad():
output = model(input_data)
Core Pattern: Eval During Training
for epoch in range(num_epochs):
# Training phase (unmerged weights)
model.train()
for batch in train_loader:
loss = model(**batch).loss
loss.backward()
optimizer.step()
optimizer.zero_grad()
# Evaluation phase (merged weights for accurate metrics)
model.eval() # Triggers merge: W' = W + BA * (alpha/r)
with torch.no_grad():
for batch in eval_loader:
output = model(**batch)
# ... compute metrics ...
# Back to training (triggers unmerge: W = W' - BA * (alpha/r))
model.train()
I/O Contract
Inputs (for Merge)
| Name | Type | Required | Description |
|---|---|---|---|
| model | nn.Module | Yes | Model with LoRA layers and trained lora_A, lora_B parameters |
| merge_weights | bool | No (default True) | Must be True for merging to occur; set per-layer at construction time |
Outputs (after Merge)
| Name | Type | Description |
|---|---|---|
| model | nn.Module | Model with LoRA weights absorbed into base weights; architecturally identical to original pretrained model |
Usage Examples
Basic Inference
import torch
import loralib as lora
model = create_lora_model()
lora_dict = torch.load('adapter.pt')
model.load_state_dict(lora_dict, strict=False)
# Merge and run inference
model.eval()
with torch.no_grad():
result = model(input_ids)
Verify Merge Behavior
import torch
import loralib as lora
# Create a simple LoRA layer
layer = lora.Linear(768, 768, r=8, lora_alpha=16)
# Before merge: check weight
original_weight = layer.weight.data.clone()
# Trigger merge
layer.eval()
merged_weight = layer.weight.data.clone()
# The weights should differ by the LoRA contribution
diff = merged_weight - original_weight
expected = (layer.lora_B @ layer.lora_A) * layer.scaling
print(f"Merge correct: {torch.allclose(diff, expected)}")
# Trigger unmerge
layer.train()
unmerged_weight = layer.weight.data.clone()
# Should be back to original
print(f"Unmerge correct: {torch.allclose(unmerged_weight, original_weight)}")
Export Merged Model (Permanent Merge)
import torch
import loralib as lora
# Load model with LoRA
model = create_lora_model()
lora_dict = torch.load('adapter.pt')
model.load_state_dict(lora_dict, strict=False)
# Merge weights
model.eval()
# Save the full merged state dict (no longer needs LoRA-aware loading)
torch.save(model.state_dict(), 'merged_model.pt')
# This merged model can be loaded as a standard model without loralib
Disable Merging (Keep Separate)
import loralib as lora
# If you do NOT want automatic merge on eval(), set merge_weights=False
layer = lora.Linear(768, 768, r=8, lora_alpha=16, merge_weights=False)
# Now eval() will NOT merge; the LoRA path remains separate
layer.eval()
# Forward pass still computes: h = Wx + BAx * (alpha/r)
# Two separate matrix multiplications are performed