Implementation:PacktPublishing LLM Engineers Handbook Save Pretrained Merged
Appearance
| Field | Value |
|---|---|
| Implementation Name | Save Pretrained Merged |
| Type | Wrapper Doc (Unsloth) |
| Source File | llm_engineering/model/finetuning/finetune.py:L218-223 |
| Workflow | LLM_Finetuning |
| Repo | PacktPublishing/LLM-Engineers-Handbook |
| Implements | Principle:PacktPublishing_LLM_Engineers_Handbook_Model_Merging_And_Publishing |
Method Signatures
# Save merged model locally
model.save_pretrained_merged(
output_dir: str,
tokenizer,
save_method: str,
) -> None
# Push merged model to HuggingFace Hub
model.push_to_hub_merged(
repo_id: str,
tokenizer,
save_method: str,
) -> None
Import
These methods are available on the Unsloth model object (no separate import needed):
# Methods are called on the model object returned by FastLanguageModel
from unsloth import FastLanguageModel
# model, tokenizer = FastLanguageModel.from_pretrained(...)
# model = FastLanguageModel.get_peft_model(...)
# ... training ...
# model.save_pretrained_merged(...)
# model.push_to_hub_merged(...)
Description
These two methods handle the final stage of the fine-tuning pipeline: merging the trained LoRA adapter weights into the base model and persisting the result. save_pretrained_merged() writes the merged model to a local directory, while push_to_hub_merged() uploads it directly to HuggingFace Hub. Both methods:
- Compute
W_merged = W_base + (alpha/r) * B @ Afor every LoRA-adapted layer. - Save the resulting weights in the HuggingFace-standard safetensors format.
- Save the tokenizer alongside the model for self-contained loading.
Key Code in Repository
# From llm_engineering/model/finetuning/finetune.py
model.save_pretrained_merged(
output_dir,
tokenizer,
save_method="merged_16bit",
)
model.push_to_hub_merged(
f"{workspace}/{model_name}",
tokenizer,
save_method="merged_16bit",
)
Parameters
save_pretrained_merged()
| Parameter | Type | Value in Repo | Description |
|---|---|---|---|
output_dir |
str |
Local path | Directory where the merged model files will be saved. |
tokenizer |
Tokenizer | — | The tokenizer to save alongside the model. |
save_method |
str |
"merged_16bit" |
Merge strategy and precision. "merged_16bit" merges adapters into base weights and saves in 16-bit (FP16/BF16) format.
|
push_to_hub_merged()
| Parameter | Type | Value in Repo | Description |
|---|---|---|---|
repo_id |
str |
f"{workspace}/{model_name}" |
HuggingFace Hub repository identifier (e.g., "my-org/my-finetuned-model").
|
tokenizer |
Tokenizer | — | The tokenizer to upload alongside the model. |
save_method |
str |
"merged_16bit" |
Same merge strategy and precision as local save. |
Available Save Methods
| Save Method | Description |
|---|---|
"merged_16bit" |
Merge LoRA into base model, save at 16-bit precision. Used in this repository. |
"merged_4bit" |
Merge and quantize to 4-bit (smaller file, slight quality loss). |
"lora" |
Save only the LoRA adapter weights (no merging). |
Returns
Both methods return None. Their effects are:
save_pretrained_merged(): Writes model files (model.safetensors,config.json, tokenizer files) tooutput_dir.push_to_hub_merged(): Uploads the same files to HuggingFace Hub under the specifiedrepo_id.
Output Files
After saving, the output directory contains:
output_dir/ model.safetensors # Merged model weights (16-bit) config.json # Model architecture configuration tokenizer.json # Tokenizer vocabulary and settings tokenizer_config.json # Tokenizer configuration special_tokens_map.json # Special token mappings generation_config.json # Default generation parameters
Usage After Saving
The merged model can be loaded with standard HuggingFace APIs (no PEFT or Unsloth required):
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("my-org/my-finetuned-model")
tokenizer = AutoTokenizer.from_pretrained("my-org/my-finetuned-model")
External Dependencies
| Package | Purpose |
|---|---|
unsloth |
Provides save_pretrained_merged() and push_to_hub_merged() methods
|
huggingface_hub |
Handles authentication and file upload to HuggingFace Hub |
safetensors |
Efficient tensor serialization format |
See Also
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment