Implementation:Intel Ipex llm Get Peft Model QLoRA
| Knowledge Sources | |
|---|---|
| Domains | NLP, Parameter_Efficient_Finetuning |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tools for preparing a 4-bit quantized model and injecting LoRA adapters in QLoRA mode, provided by IPEX-LLM.
Description
Three functions work together: prepare_model_for_kbit_training freezes base model parameters and enables gradient computation on adapter layers. LoraConfig defines the adapter configuration with training_mode="qlora". get_peft_model wraps the prepared model with LoRA adapters. All three are imported from ipex_llm.transformers.qlora to ensure IPEX-LLM compatibility.
Usage
Use after loading a model with BitsAndBytesConfig for QLoRA training. The three functions must be called in sequence: prepare → configure → wrap.
Code Reference
Source Location
- Repository: IPEX-LLM
- File: python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/alpaca_qlora_finetuning.py
- Lines: 209-222
Signature
def prepare_model_for_kbit_training(
model: PreTrainedModel,
use_gradient_checkpointing: bool = False
) -> PreTrainedModel:
"""Freeze base model and prepare for k-bit adapter training."""
class LoraConfig:
def __init__(
self,
r: int = 8,
lora_alpha: int = 16,
target_modules: List[str] = None,
lora_dropout: float = 0.05,
bias: str = "none",
task_type: str = "CAUSAL_LM",
training_mode: str = "qlora",
):
"""Configure LoRA adapter parameters for QLoRA training."""
def get_peft_model(
model: PreTrainedModel,
peft_config: LoraConfig
) -> PeftModel:
"""Wrap model with LoRA adapters according to config."""
Import
from ipex_llm.transformers.qlora import (
get_peft_model,
prepare_model_for_kbit_training,
LoraConfig
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | PreTrainedModel | Yes | 4-bit quantized base model from AutoModelForCausalLM.from_pretrained |
| r | int | No | LoRA rank (default 8) |
| lora_alpha | int | No | LoRA alpha scaling factor (default 16) |
| target_modules | List[str] | No | Linear layers to inject adapters into (default: q,v,k,o,up,down,gate_proj) |
| lora_dropout | float | No | Dropout rate for LoRA layers (default 0.05) |
| training_mode | str | Yes | Must be "qlora" for QLoRA training |
Outputs
| Name | Type | Description |
|---|---|---|
| model | PeftModel | Model with LoRA adapters injected, trainable parameters isolated |
Usage Examples
from ipex_llm.transformers.qlora import (
get_peft_model, prepare_model_for_kbit_training, LoraConfig
)
# 1. Prepare model for k-bit training
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=False)
# 2. Configure LoRA adapters
config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["q_proj", "v_proj", "k_proj", "o_proj",
"up_proj", "down_proj", "gate_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
training_mode="qlora",
)
# 3. Inject adapters
model = get_peft_model(model, config)
model.print_trainable_parameters()
# Output: trainable params: X || all params: Y || trainable%: Z