Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:PacktPublishing LLM Engineers Handbook Save Pretrained Merged

From Leeroopedia


Field Value
Implementation Name Save Pretrained Merged
Type Wrapper Doc (Unsloth)
Source File llm_engineering/model/finetuning/finetune.py:L218-223
Workflow LLM_Finetuning
Repo PacktPublishing/LLM-Engineers-Handbook
Implements Principle:PacktPublishing_LLM_Engineers_Handbook_Model_Merging_And_Publishing

Method Signatures

# Save merged model locally
model.save_pretrained_merged(
    output_dir: str,
    tokenizer,
    save_method: str,
) -> None

# Push merged model to HuggingFace Hub
model.push_to_hub_merged(
    repo_id: str,
    tokenizer,
    save_method: str,
) -> None

Import

These methods are available on the Unsloth model object (no separate import needed):

# Methods are called on the model object returned by FastLanguageModel
from unsloth import FastLanguageModel
# model, tokenizer = FastLanguageModel.from_pretrained(...)
# model = FastLanguageModel.get_peft_model(...)
# ... training ...
# model.save_pretrained_merged(...)
# model.push_to_hub_merged(...)

Description

These two methods handle the final stage of the fine-tuning pipeline: merging the trained LoRA adapter weights into the base model and persisting the result. save_pretrained_merged() writes the merged model to a local directory, while push_to_hub_merged() uploads it directly to HuggingFace Hub. Both methods:

  1. Compute W_merged = W_base + (alpha/r) * B @ A for every LoRA-adapted layer.
  2. Save the resulting weights in the HuggingFace-standard safetensors format.
  3. Save the tokenizer alongside the model for self-contained loading.

Key Code in Repository

# From llm_engineering/model/finetuning/finetune.py

model.save_pretrained_merged(
    output_dir,
    tokenizer,
    save_method="merged_16bit",
)

model.push_to_hub_merged(
    f"{workspace}/{model_name}",
    tokenizer,
    save_method="merged_16bit",
)

Parameters

save_pretrained_merged()

Parameter Type Value in Repo Description
output_dir str Local path Directory where the merged model files will be saved.
tokenizer Tokenizer The tokenizer to save alongside the model.
save_method str "merged_16bit" Merge strategy and precision. "merged_16bit" merges adapters into base weights and saves in 16-bit (FP16/BF16) format.

push_to_hub_merged()

Parameter Type Value in Repo Description
repo_id str f"{workspace}/{model_name}" HuggingFace Hub repository identifier (e.g., "my-org/my-finetuned-model").
tokenizer Tokenizer The tokenizer to upload alongside the model.
save_method str "merged_16bit" Same merge strategy and precision as local save.

Available Save Methods

Save Method Description
"merged_16bit" Merge LoRA into base model, save at 16-bit precision. Used in this repository.
"merged_4bit" Merge and quantize to 4-bit (smaller file, slight quality loss).
"lora" Save only the LoRA adapter weights (no merging).

Returns

Both methods return None. Their effects are:

  • save_pretrained_merged(): Writes model files (model.safetensors, config.json, tokenizer files) to output_dir.
  • push_to_hub_merged(): Uploads the same files to HuggingFace Hub under the specified repo_id.

Output Files

After saving, the output directory contains:

output_dir/
  model.safetensors          # Merged model weights (16-bit)
  config.json                # Model architecture configuration
  tokenizer.json             # Tokenizer vocabulary and settings
  tokenizer_config.json      # Tokenizer configuration
  special_tokens_map.json    # Special token mappings
  generation_config.json     # Default generation parameters

Usage After Saving

The merged model can be loaded with standard HuggingFace APIs (no PEFT or Unsloth required):

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("my-org/my-finetuned-model")
tokenizer = AutoTokenizer.from_pretrained("my-org/my-finetuned-model")

External Dependencies

Package Purpose
unsloth Provides save_pretrained_merged() and push_to_hub_merged() methods
huggingface_hub Handles authentication and file upload to HuggingFace Hub
safetensors Efficient tensor serialization format

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment