Principle:LLMBook zh LLMBook zh github io LoRA Adapter Merging
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Parameter_Efficient_Finetuning, Deployment |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
The process of combining trained LoRA adapter weights back into the base model to produce a standalone model without adapter overhead.
Description
LoRA Adapter Merging takes the trained low-rank A and B matrices and adds their product (BA) to the original frozen weight matrix W, producing W' = W + BA. After merging, the model behaves identically to the LoRA-augmented model but without the separate adapter pathway, resulting in no additional inference latency.
This is useful for deployment, where a single merged model file is simpler to serve than a base model plus adapter files.
Usage
Use this after LoRA training completes, when you want to deploy the fine-tuned model as a standalone model or when you need to convert LoRA checkpoints for frameworks that do not support PEFT adapters.
Theoretical Basis
Merging computes:
After merging, the adapter layers are removed ("unloaded"), and the model reverts to a standard PreTrainedModel with updated weights.