Principle:LLMBook zh LLMBook zh github io LoRA Adapter Merging

Knowledge Sources	LoRA: Low-Rank Adaptation of Large Language Models PEFT Model Merging LLMBook-zh
Domains	Deep_Learning, Parameter_Efficient_Finetuning, Deployment
Last Updated	2026-02-08 00:00 GMT

Overview

The process of combining trained LoRA adapter weights back into the base model to produce a standalone model without adapter overhead.

Description

LoRA Adapter Merging takes the trained low-rank A and B matrices and adds their product (BA) to the original frozen weight matrix W, producing W' = W + BA. After merging, the model behaves identically to the LoRA-augmented model but without the separate adapter pathway, resulting in no additional inference latency.

This is useful for deployment, where a single merged model file is simpler to serve than a base model plus adapter files.

Usage

Use this after LoRA training completes, when you want to deploy the fine-tuned model as a standalone model or when you need to convert LoRA checkpoints for frameworks that do not support PEFT adapters.

Theoretical Basis

Merging computes:

$W^{'} = W + B A$

After merging, the adapter layers are removed ("unloaded"), and the model reverts to a standard PreTrainedModel with updated weights.

Related Pages

Implemented By

Implementation:LLMBook_zh_LLMBook_zh_github_io_AutoPeftModelForCausalLM_Merge_And_Unload

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment