Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Unslothai Unsloth Save Pretrained Merged

From Leeroopedia


Knowledge Sources
Domains Model_Deployment, Serialization
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for merging LoRA adapters into base model weights and saving as SafeTensors provided by the Unsloth library.

Description

model.save_pretrained_merged is a method patched onto PeftModel instances by Unsloth. It wraps unsloth_save_model (L235-1068) which handles the full merge pipeline: layer-by-layer 4-bit dequantization, LoRA weight merging, vocabulary resizing, weight untying, and sharded SafeTensors saving. The function monitors GPU memory usage and adjusts batch sizes for dequantization to stay within the maximum_memory_usage threshold.

Usage

Call on a trained PeftModel as the final step of any fine-tuning workflow. Pass the tokenizer to save it alongside the model. The output directory will contain all files needed to reload the model with standard HuggingFace from_pretrained.

Code Reference

Source Location

  • Repository: unsloth
  • File: unsloth/save.py
  • Lines: L1337-1376 (save_pretrained_merged wrapper), L235-1068 (unsloth_save_model core)

Signature

def unsloth_save_pretrained_merged(
    self,
    save_directory: Union[str, os.PathLike],
    tokenizer = None,
    save_method: str = "merged_16bit",
    push_to_hub: bool = False,
    token: Optional[Union[str, bool]] = None,
    is_main_process: bool = True,
    state_dict: Optional[dict] = None,
    save_function: Callable = torch.save,
    max_shard_size: Union[int, str] = "5GB",
    safe_serialization: bool = True,
    variant: Optional[str] = None,
    save_peft_format: bool = True,
    tags: List[str] = None,
    temporary_location: str = "_unsloth_temporary_saved_buffers",
    maximum_memory_usage: float = 0.75,
) -> None:
    """
    Merges LoRA weights and saves the model in SafeTensors format.

    save_method options:
        "merged_16bit" — Dequantize 4-bit, merge LoRA, save in float16.
                         Best for GGUF conversion and general deployment.
        "merged_4bit"  — Merge LoRA into 4-bit weights (no dequantization).
                         Best for DPO/continued training with HF inference.
        "lora"         — Save LoRA adapters only, no merging.
                         Best for adapter sharing and switching.
    """

Import

# Called as a method on the model instance:
model.save_pretrained_merged("./output_dir", tokenizer=tokenizer, save_method="merged_16bit")

I/O Contract

Inputs

Name Type Required Description
save_directory str Yes Output directory path
tokenizer PreTrainedTokenizer No Tokenizer to save alongside model
save_method str No "merged_16bit", "merged_4bit", or "lora" (default: "merged_16bit")
max_shard_size str No SafeTensors shard size limit (default: "5GB")
maximum_memory_usage float No GPU memory threshold for dequantization (default: 0.75)
safe_serialization bool No Use SafeTensors format (default: True)

Outputs

Name Type Description
save_directory contents Files model.safetensors (or shards), config.json, tokenizer.json, tokenizer_config.json, special_tokens_map.json

Usage Examples

Save Merged 16-bit Model

# After training
model.save_pretrained_merged(
    "./merged_model",
    tokenizer=tokenizer,
    save_method="merged_16bit",
)
# Output: ./merged_model/model.safetensors, config.json, tokenizer files

Save LoRA Adapters Only

model.save_pretrained_merged(
    "./lora_adapters",
    tokenizer=tokenizer,
    save_method="lora",
)
# Output: ./lora_adapters/adapter_model.safetensors, adapter_config.json

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment