Implementation:Unslothai Unsloth Save Pretrained Merged
| Knowledge Sources | |
|---|---|
| Domains | Model_Deployment, Serialization |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for merging LoRA adapters into base model weights and saving as SafeTensors provided by the Unsloth library.
Description
model.save_pretrained_merged is a method patched onto PeftModel instances by Unsloth. It wraps unsloth_save_model (L235-1068) which handles the full merge pipeline: layer-by-layer 4-bit dequantization, LoRA weight merging, vocabulary resizing, weight untying, and sharded SafeTensors saving. The function monitors GPU memory usage and adjusts batch sizes for dequantization to stay within the maximum_memory_usage threshold.
Usage
Call on a trained PeftModel as the final step of any fine-tuning workflow. Pass the tokenizer to save it alongside the model. The output directory will contain all files needed to reload the model with standard HuggingFace from_pretrained.
Code Reference
Source Location
- Repository: unsloth
- File: unsloth/save.py
- Lines: L1337-1376 (save_pretrained_merged wrapper), L235-1068 (unsloth_save_model core)
Signature
def unsloth_save_pretrained_merged(
self,
save_directory: Union[str, os.PathLike],
tokenizer = None,
save_method: str = "merged_16bit",
push_to_hub: bool = False,
token: Optional[Union[str, bool]] = None,
is_main_process: bool = True,
state_dict: Optional[dict] = None,
save_function: Callable = torch.save,
max_shard_size: Union[int, str] = "5GB",
safe_serialization: bool = True,
variant: Optional[str] = None,
save_peft_format: bool = True,
tags: List[str] = None,
temporary_location: str = "_unsloth_temporary_saved_buffers",
maximum_memory_usage: float = 0.75,
) -> None:
"""
Merges LoRA weights and saves the model in SafeTensors format.
save_method options:
"merged_16bit" — Dequantize 4-bit, merge LoRA, save in float16.
Best for GGUF conversion and general deployment.
"merged_4bit" — Merge LoRA into 4-bit weights (no dequantization).
Best for DPO/continued training with HF inference.
"lora" — Save LoRA adapters only, no merging.
Best for adapter sharing and switching.
"""
Import
# Called as a method on the model instance:
model.save_pretrained_merged("./output_dir", tokenizer=tokenizer, save_method="merged_16bit")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| save_directory | str | Yes | Output directory path |
| tokenizer | PreTrainedTokenizer | No | Tokenizer to save alongside model |
| save_method | str | No | "merged_16bit", "merged_4bit", or "lora" (default: "merged_16bit") |
| max_shard_size | str | No | SafeTensors shard size limit (default: "5GB") |
| maximum_memory_usage | float | No | GPU memory threshold for dequantization (default: 0.75) |
| safe_serialization | bool | No | Use SafeTensors format (default: True) |
Outputs
| Name | Type | Description |
|---|---|---|
| save_directory contents | Files | model.safetensors (or shards), config.json, tokenizer.json, tokenizer_config.json, special_tokens_map.json |
Usage Examples
Save Merged 16-bit Model
# After training
model.save_pretrained_merged(
"./merged_model",
tokenizer=tokenizer,
save_method="merged_16bit",
)
# Output: ./merged_model/model.safetensors, config.json, tokenizer files
Save LoRA Adapters Only
model.save_pretrained_merged(
"./lora_adapters",
tokenizer=tokenizer,
save_method="lora",
)
# Output: ./lora_adapters/adapter_model.safetensors, adapter_config.json