Principle:Unslothai Unsloth Model Merging And Saving

Knowledge Sources	LoRA: Low-Rank Adaptation Unsloth SafeTensors Format
Domains	Model_Deployment, Serialization
Last Updated	2026-02-07 00:00 GMT

Overview

A model serialization technique that merges trained LoRA adapter weights back into the base model and saves the result as a standalone model in SafeTensors format.

Description

After fine-tuning with LoRA, the model exists as a frozen base plus small adapter matrices. For deployment, these adapters must be merged into the base weights to produce a single, self-contained model. The merging process involves:

Dequantization: If the base model was loaded in 4-bit, weights are dequantized back to float16 layer-by-layer to manage memory.
LoRA Merge: For each adapted layer, compute $W_{m e r g e d} = W_{b a s e} + \frac{α}{r} B \cdot A$ .
Vocabulary Handling: If the vocabulary was resized during training (new tokens added), the embedding and output projection matrices are adjusted.
Sharded Saving: The merged model is saved in SafeTensors format with configurable shard sizes.

The key challenge is memory management: a 7B model in float16 requires ~14GB, but during merging both the quantized and dequantized weights must coexist temporarily. Unsloth handles this with layer-by-layer dequantization controlled by maximum_memory_usage.

Usage

Use this as the final step in any fine-tuning workflow to produce a deployable model. Choose save_method="merged_16bit" for GGUF conversion or general deployment, save_method="merged_4bit" for quantized deployment, or save_method="lora" to save adapters only.

Theoretical Basis

The merge operation for each LoRA-adapted linear layer:

$W_{m e r g e d} = dequantize (W_{4 b i t}) + \frac{α}{r} B \cdot A$

# Abstract LoRA merge process
for layer in model.layers:
    if has_lora(layer):
        W_base = dequantize(layer.weight)  # 4-bit -> float16
        W_lora = (layer.lora_alpha / layer.r) * layer.lora_B @ layer.lora_A
        layer.weight = W_base + W_lora
        remove_lora(layer)  # Clean up adapter matrices
save_safetensors(model, output_dir, shard_size="5GB")

Related Pages

Implemented By

Implementation:Unslothai_Unsloth_Save_Pretrained_Merged

Uses Heuristic

Heuristic:Unslothai_Unsloth_Merge_Memory_Management

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment