Implementation:PacktPublishing LLM Engineers Handbook FastLanguageModel Get Peft Model
| Field | Value |
|---|---|
| Implementation Name | FastLanguageModel Get Peft Model |
| Type | Wrapper Doc (Unsloth wraps PEFT) |
| Source File | llm_engineering/model/finetuning/finetune.py:L45-51 |
| Workflow | LLM_Finetuning |
| Repo | PacktPublishing/LLM-Engineers-Handbook |
| Implements | Principle:PacktPublishing_LLM_Engineers_Handbook_LoRA_Adapter_Injection |
Function Signature
FastLanguageModel.get_peft_model(
model,
r: int,
lora_alpha: int,
lora_dropout: float,
target_modules: List[str],
) -> model
Import
from unsloth import FastLanguageModel
Description
FastLanguageModel.get_peft_model() injects LoRA (Low-Rank Adaptation) adapter layers into the specified modules of a pre-trained language model. This method wraps HuggingFace's PEFT library with Unsloth-specific optimizations, producing a model where only the LoRA adapter weights are trainable while the original weights remain frozen.
After this call, the model is ready for parameter-efficient fine-tuning with a dramatically reduced number of trainable parameters.
Parameters
| Parameter | Type | Value in Repo | Description |
|---|---|---|---|
model |
Model object | — | The pre-trained model returned by FastLanguageModel.from_pretrained().
|
r |
int |
32 |
LoRA rank. Controls the dimensionality of the low-rank decomposition matrices. Higher values increase expressiveness at the cost of more parameters. |
lora_alpha |
int |
32 |
LoRA scaling factor. The effective scaling applied to adapter output is lora_alpha / r. With r=32 and lora_alpha=32, the scaling factor is 1.0.
|
lora_dropout |
float |
0 |
Dropout probability applied to LoRA adapter outputs during training. 0 means no dropout regularization. |
target_modules |
List[str] |
See below | List of module names to inject LoRA adapters into. |
Target Modules
The repository injects LoRA adapters into all major projection layers:
| Module | Layer Type | Description |
|---|---|---|
q_proj |
Attention | Query projection |
k_proj |
Attention | Key projection |
v_proj |
Attention | Value projection |
o_proj |
Attention | Output projection |
up_proj |
MLP | Feed-forward up-projection |
down_proj |
MLP | Feed-forward down-projection |
gate_proj |
MLP | Gated feed-forward projection |
Returns
The same model object, now modified in-place with LoRA adapter layers injected. Only the adapter parameters are marked as trainable.
Key Code in Repository
# From llm_engineering/model/finetuning/finetune.py
model = FastLanguageModel.get_peft_model(
model,
r=32,
lora_alpha=32,
lora_dropout=0,
target_modules=[
"q_proj", "k_proj", "v_proj",
"up_proj", "down_proj",
"o_proj", "gate_proj",
],
)
Configuration Analysis
r=32: A moderate rank that balances parameter count and expressiveness. For a 7B model, this typically results in ~50-100M trainable parameters out of 7B total (~1-1.5%).lora_alpha=32: Equal tor, giving an effective scaling of 1.0. This means the adapter contributes equally to the original weights without additional amplification.lora_dropout=0: No dropout, suggesting the training data is sufficient to avoid overfitting.- All 7 target modules: Comprehensive injection into both attention and MLP layers for maximum adaptation capability.
External Dependencies
| Package | Purpose |
|---|---|
unsloth |
Optimized LoRA injection with fused kernels |
peft |
Underlying PEFT/LoRA implementation (wrapped by Unsloth) |