Implementation:PacktPublishing LLM Engineers Handbook Save Pretrained Merged

Field	Value
Implementation Name	Save Pretrained Merged
Type	Wrapper Doc (Unsloth)
Source File	llm_engineering/model/finetuning/finetune.py:L218-223
Workflow	LLM_Finetuning
Repo	PacktPublishing/LLM-Engineers-Handbook
Implements	Principle:PacktPublishing_LLM_Engineers_Handbook_Model_Merging_And_Publishing

Method Signatures

# Save merged model locally
model.save_pretrained_merged(
    output_dir: str,
    tokenizer,
    save_method: str,
) -> None

# Push merged model to HuggingFace Hub
model.push_to_hub_merged(
    repo_id: str,
    tokenizer,
    save_method: str,
) -> None

Import

These methods are available on the Unsloth model object (no separate import needed):

# Methods are called on the model object returned by FastLanguageModel
from unsloth import FastLanguageModel
# model, tokenizer = FastLanguageModel.from_pretrained(...)
# model = FastLanguageModel.get_peft_model(...)
# ... training ...
# model.save_pretrained_merged(...)
# model.push_to_hub_merged(...)

Description

These two methods handle the final stage of the fine-tuning pipeline: merging the trained LoRA adapter weights into the base model and persisting the result. save_pretrained_merged() writes the merged model to a local directory, while push_to_hub_merged() uploads it directly to HuggingFace Hub. Both methods:

Compute W_merged = W_base + (alpha/r) * B @ A for every LoRA-adapted layer.
Save the resulting weights in the HuggingFace-standard safetensors format.
Save the tokenizer alongside the model for self-contained loading.

Key Code in Repository

# From llm_engineering/model/finetuning/finetune.py

model.save_pretrained_merged(
    output_dir,
    tokenizer,
    save_method="merged_16bit",
)

model.push_to_hub_merged(
    f"{workspace}/{model_name}",
    tokenizer,
    save_method="merged_16bit",
)

Parameters

`save_pretrained_merged()`

Parameter	Type	Value in Repo	Description
`output_dir`	`str`	Local path	Directory where the merged model files will be saved.
`tokenizer`	Tokenizer	—	The tokenizer to save alongside the model.
`save_method`	`str`	`"merged_16bit"`	Merge strategy and precision. `"merged_16bit"` merges adapters into base weights and saves in 16-bit (FP16/BF16) format.

`push_to_hub_merged()`

Parameter	Type	Value in Repo	Description
`repo_id`	`str`	`f"{workspace}/{model_name}"`	HuggingFace Hub repository identifier (e.g., `"my-org/my-finetuned-model"`).
`tokenizer`	Tokenizer	—	The tokenizer to upload alongside the model.
`save_method`	`str`	`"merged_16bit"`	Same merge strategy and precision as local save.

Available Save Methods

Save Method	Description
`"merged_16bit"`	Merge LoRA into base model, save at 16-bit precision. Used in this repository.
`"merged_4bit"`	Merge and quantize to 4-bit (smaller file, slight quality loss).
`"lora"`	Save only the LoRA adapter weights (no merging).

Returns

Both methods return None. Their effects are:

save_pretrained_merged(): Writes model files (model.safetensors, config.json, tokenizer files) to output_dir.
push_to_hub_merged(): Uploads the same files to HuggingFace Hub under the specified repo_id.

Output Files

After saving, the output directory contains:

output_dir/
  model.safetensors          # Merged model weights (16-bit)
  config.json                # Model architecture configuration
  tokenizer.json             # Tokenizer vocabulary and settings
  tokenizer_config.json      # Tokenizer configuration
  special_tokens_map.json    # Special token mappings
  generation_config.json     # Default generation parameters

Usage After Saving

The merged model can be loaded with standard HuggingFace APIs (no PEFT or Unsloth required):

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("my-org/my-finetuned-model")
tokenizer = AutoTokenizer.from_pretrained("my-org/my-finetuned-model")

External Dependencies

Package	Purpose
`unsloth`	Provides `save_pretrained_merged()` and `push_to_hub_merged()` methods
`huggingface_hub`	Handles authentication and file upload to HuggingFace Hub
`safetensors`	Efficient tensor serialization format

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment