Implementation:Unslothai Unsloth Save Pretrained Merged

Knowledge Sources	Unsloth SafeTensors
Domains	Model_Deployment, Serialization
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for merging LoRA adapters into base model weights and saving as SafeTensors provided by the Unsloth library.

Description

model.save_pretrained_merged is a method patched onto PeftModel instances by Unsloth. It wraps unsloth_save_model (L235-1068) which handles the full merge pipeline: layer-by-layer 4-bit dequantization, LoRA weight merging, vocabulary resizing, weight untying, and sharded SafeTensors saving. The function monitors GPU memory usage and adjusts batch sizes for dequantization to stay within the maximum_memory_usage threshold.

Usage

Call on a trained PeftModel as the final step of any fine-tuning workflow. Pass the tokenizer to save it alongside the model. The output directory will contain all files needed to reload the model with standard HuggingFace from_pretrained.

Code Reference

Source Location

Repository: unsloth
File: unsloth/save.py
Lines: L1337-1376 (save_pretrained_merged wrapper), L235-1068 (unsloth_save_model core)

Signature

def unsloth_save_pretrained_merged(
    self,
    save_directory: Union[str, os.PathLike],
    tokenizer = None,
    save_method: str = "merged_16bit",
    push_to_hub: bool = False,
    token: Optional[Union[str, bool]] = None,
    is_main_process: bool = True,
    state_dict: Optional[dict] = None,
    save_function: Callable = torch.save,
    max_shard_size: Union[int, str] = "5GB",
    safe_serialization: bool = True,
    variant: Optional[str] = None,
    save_peft_format: bool = True,
    tags: List[str] = None,
    temporary_location: str = "_unsloth_temporary_saved_buffers",
    maximum_memory_usage: float = 0.75,
) -> None:
    """
    Merges LoRA weights and saves the model in SafeTensors format.

    save_method options:
        "merged_16bit" — Dequantize 4-bit, merge LoRA, save in float16.
                         Best for GGUF conversion and general deployment.
        "merged_4bit"  — Merge LoRA into 4-bit weights (no dequantization).
                         Best for DPO/continued training with HF inference.
        "lora"         — Save LoRA adapters only, no merging.
                         Best for adapter sharing and switching.
    """

Import

# Called as a method on the model instance:
model.save_pretrained_merged("./output_dir", tokenizer=tokenizer, save_method="merged_16bit")

I/O Contract

Inputs

Name	Type	Required	Description
save_directory	str	Yes	Output directory path
tokenizer	PreTrainedTokenizer	No	Tokenizer to save alongside model
save_method	str	No	"merged_16bit", "merged_4bit", or "lora" (default: "merged_16bit")
max_shard_size	str	No	SafeTensors shard size limit (default: "5GB")
maximum_memory_usage	float	No	GPU memory threshold for dequantization (default: 0.75)
safe_serialization	bool	No	Use SafeTensors format (default: True)

Outputs

Name	Type	Description
save_directory contents	Files	model.safetensors (or shards), config.json, tokenizer.json, tokenizer_config.json, special_tokens_map.json

Usage Examples

Save Merged 16-bit Model

# After training
model.save_pretrained_merged(
    "./merged_model",
    tokenizer=tokenizer,
    save_method="merged_16bit",
)
# Output: ./merged_model/model.safetensors, config.json, tokenizer files

Save LoRA Adapters Only

model.save_pretrained_merged(
    "./lora_adapters",
    tokenizer=tokenizer,
    save_method="lora",
)
# Output: ./lora_adapters/adapter_model.safetensors, adapter_config.json

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment