Implementation:Intel Ipex llm Transformers Trainer QLoRA
| Knowledge Sources | |
|---|---|
| Domains | NLP, Training |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
HuggingFace Trainer configured for QLoRA fine-tuning on Intel XPU with CCL distributed backend.
Description
This is a Wrapper Doc for the standard HuggingFace transformers.Trainer used in the context of QLoRA fine-tuning on Intel XPU. The key IPEX-LLM-specific configuration includes: ddp_backend="ccl" for Intel oneCCL communication, bf16=True for bfloat16 training stability, and optional DeepSpeed ZeRO integration for multi-GPU scaling. The DataCollatorForSeq2Seq handles padding with pad_to_multiple_of=8 for XPU efficiency.
External Reference
Usage
Use after preparing the PeftModel and tokenized datasets. The Trainer manages the full training loop including gradient accumulation, checkpointing, and evaluation.
Code Reference
Source Location
- Repository: IPEX-LLM
- File: python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/alpaca_qlora_finetuning.py
- Lines: 240-278
Signature
trainer = transformers.Trainer(
model=model, # PeftModel with LoRA adapters
train_dataset=train_data, # Tokenized training dataset
eval_dataset=val_data, # Tokenized validation dataset
args=transformers.TrainingArguments(...),
data_collator=transformers.DataCollatorForSeq2Seq(
tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
),
)
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
Import
import transformers
from transformers import TrainingArguments, DataCollatorForSeq2Seq
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | PeftModel | Yes | Model with LoRA adapters from get_peft_model |
| train_dataset | Dataset | Yes | Tokenized training dataset |
| eval_dataset | Dataset | No | Tokenized validation dataset |
| per_device_train_batch_size | int | No | Micro batch size (default 2) |
| gradient_accumulation_steps | int | No | Steps before optimizer update |
| num_train_epochs | int | No | Number of training epochs (default 3) |
| learning_rate | float | No | Learning rate (default 3e-5) |
| ddp_backend | str | Yes | Must be "ccl" for Intel XPU |
| deepspeed | str | No | Path to DeepSpeed config JSON |
Outputs
| Name | Type | Description |
|---|---|---|
| trained model | PeftModel | Model with trained LoRA adapter weights |
| checkpoints | Files | Saved to output_dir every save_steps |
| logs | Dict | Training metrics (loss, learning rate, etc.) |
Usage Examples
import transformers
trainer = transformers.Trainer(
model=model,
train_dataset=train_data,
eval_dataset=val_data,
args=transformers.TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=64,
max_grad_norm=0.3,
num_train_epochs=3,
learning_rate=3e-5,
lr_scheduler_type="cosine",
bf16=True,
logging_steps=1,
optim="adamw_torch",
evaluation_strategy="steps",
save_strategy="steps",
eval_steps=100,
save_steps=100,
output_dir="./qlora-output",
ddp_backend="ccl",
save_safetensors=False,
),
data_collator=transformers.DataCollatorForSeq2Seq(
tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
),
)
# Run training
trainer.train()
# Save adapter
model.save_pretrained("./qlora-output")