Implementation:Microsoft LoRA LoRA Weight Merging

Knowledge Sources	Microsoft LoRA
Domains	Inference, Parameter_Efficient_Fine_Tuning
Pattern Doc	Yes
Last Updated	2026-02-10 05:00 GMT

Overview

Pattern documentation for merging and unmerging LoRA weights into base model weights for zero-overhead inference.

Description

This is a pattern doc describing how LoRA weight merging works and the interface users must follow to deploy LoRA models for inference. The merge/unmerge logic is implemented within each LoRA layer class's train() method override. When merge_weights=True (the default), calling model.eval() automatically merges LoRA weights into base weights, and calling model.train() automatically unmerges them.

Usage

Follow this pattern after training is complete and you want to deploy the model for inference, or during training when you need to evaluate the model with merged weights for accurate metrics.

Code Reference

Source Locations

The merge/unmerge logic is implemented in the train() method of each LoRA layer class:

Repository: microsoft/LoRA
File: loralib/layers.py
Linear.train(): Lines 62-76
Embedding.train(): Lines 127-142 (Note: Embedding uses A @ B since it's the transpose arrangement)
MergedLinear.train(): Lines 218-233
ConvLoRA.train(): Lines 275-288

Merge Logic (Linear Layer Example)

def train(self, mode: bool = True):
    """Override train() to handle weight merge/unmerge.

    On eval (mode=False): merge LoRA into base weight
    On train (mode=True): unmerge LoRA from base weight
    """
    nn.Linear.train(self, mode)
    if mode:
        # Switching to train mode: unmerge if currently merged
        if self.merge_weights and self.merged:
            if self.r > 0:
                self.weight.data -= (self.lora_B @ self.lora_A) * self.scaling
            self.merged = False
    else:
        # Switching to eval mode: merge if not already merged
        if self.merge_weights and not self.merged:
            if self.r > 0:
                self.weight.data += (self.lora_B @ self.lora_A) * self.scaling
            self.merged = True

Pattern Interface

Prerequisites

Model must have LoRA layers with merge_weights=True (the default)
LoRA parameters must be trained (or loaded from a checkpoint)
For checkpoint loading: use load_state_dict(lora_dict, strict=False)

Core Pattern: Inference Deployment

import torch
import loralib as lora

# 1. Load base model with LoRA layer structure
model = create_model_with_lora_layers(r=8, lora_alpha=16)

# 2. Load trained LoRA weights
lora_dict = torch.load('lora_checkpoint.pt')
model.load_state_dict(lora_dict, strict=False)

# 3. Merge LoRA weights into base weights (triggered by eval())
model.eval()
# At this point, model.weight contains W + BA * (alpha/r)
# No extra computation paths exist; inference is zero-overhead

# 4. Run inference
with torch.no_grad():
    output = model(input_data)

Core Pattern: Eval During Training

for epoch in range(num_epochs):
    # Training phase (unmerged weights)
    model.train()
    for batch in train_loader:
        loss = model(**batch).loss
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

    # Evaluation phase (merged weights for accurate metrics)
    model.eval()  # Triggers merge: W' = W + BA * (alpha/r)
    with torch.no_grad():
        for batch in eval_loader:
            output = model(**batch)
            # ... compute metrics ...

    # Back to training (triggers unmerge: W = W' - BA * (alpha/r))
    model.train()

I/O Contract

Inputs (for Merge)

Name	Type	Required	Description
model	nn.Module	Yes	Model with LoRA layers and trained lora_A, lora_B parameters
merge_weights	bool	No (default True)	Must be True for merging to occur; set per-layer at construction time

Outputs (after Merge)

Name	Type	Description
model	nn.Module	Model with LoRA weights absorbed into base weights; architecturally identical to original pretrained model

Usage Examples

Basic Inference

import torch
import loralib as lora

model = create_lora_model()
lora_dict = torch.load('adapter.pt')
model.load_state_dict(lora_dict, strict=False)

# Merge and run inference
model.eval()
with torch.no_grad():
    result = model(input_ids)

Verify Merge Behavior

import torch
import loralib as lora

# Create a simple LoRA layer
layer = lora.Linear(768, 768, r=8, lora_alpha=16)

# Before merge: check weight
original_weight = layer.weight.data.clone()

# Trigger merge
layer.eval()
merged_weight = layer.weight.data.clone()

# The weights should differ by the LoRA contribution
diff = merged_weight - original_weight
expected = (layer.lora_B @ layer.lora_A) * layer.scaling
print(f"Merge correct: {torch.allclose(diff, expected)}")

# Trigger unmerge
layer.train()
unmerged_weight = layer.weight.data.clone()

# Should be back to original
print(f"Unmerge correct: {torch.allclose(unmerged_weight, original_weight)}")

Export Merged Model (Permanent Merge)

import torch
import loralib as lora

# Load model with LoRA
model = create_lora_model()
lora_dict = torch.load('adapter.pt')
model.load_state_dict(lora_dict, strict=False)

# Merge weights
model.eval()

# Save the full merged state dict (no longer needs LoRA-aware loading)
torch.save(model.state_dict(), 'merged_model.pt')

# This merged model can be loaded as a standard model without loralib

Disable Merging (Keep Separate)

import loralib as lora

# If you do NOT want automatic merge on eval(), set merge_weights=False
layer = lora.Linear(768, 768, r=8, lora_alpha=16, merge_weights=False)

# Now eval() will NOT merge; the LoRA path remains separate
layer.eval()
# Forward pass still computes: h = Wx + BAx * (alpha/r)
# Two separate matrix multiplications are performed

Related Pages

Implements Principle

Principle:Microsoft_LoRA_Weight_Merging_for_Inference

Uses Heuristic

Heuristic:Microsoft_LoRA_Scaling_Factor_Alpha_Over_R

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment