Implementation:Intel Ipex llm Transformers Trainer LoRA
| Knowledge Sources | |
|---|---|
| Domains | NLP, Training |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
HuggingFace Trainer configured for standard LoRA fine-tuning on Intel XPU with optional DeepSpeed ZeRO Stage 3.
Description
This is a Wrapper Doc for the HuggingFace transformers.Trainer in the context of standard LoRA fine-tuning. The key difference from the QLoRA variant is support for DeepSpeed ZeRO Stage 3, which partitions model parameters, gradients, and optimizer states across GPUs. This enables training bf16 models that exceed single-GPU memory. Configuration uses deepspeed_zero3=True with a ZeRO3 config JSON.
External Reference
Usage
Use after preparing the bf16 PeftModel and tokenized datasets. Enable DeepSpeed ZeRO3 when the model exceeds single-GPU memory.
Code Reference
Source Location
- Repository: IPEX-LLM
- File: python/llm/example/GPU/LLM-Finetuning/LoRA/alpaca_lora_finetuning.py
- Lines: 227-265
Signature
trainer = transformers.Trainer(
model=model,
train_dataset=train_data,
eval_dataset=val_data,
args=transformers.TrainingArguments(
per_device_train_batch_size=micro_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
max_grad_norm=0.3,
num_train_epochs=num_epochs,
learning_rate=learning_rate,
lr_scheduler_type="cosine",
bf16=True,
optim="adamw_torch",
ddp_backend="ccl",
deepspeed=deepspeed, # Optional ZeRO3 config path
save_safetensors=False,
),
data_collator=transformers.DataCollatorForSeq2Seq(
tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
),
)
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
Import
import transformers
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | PeftModel | Yes | bf16 model with LoRA adapters |
| train_dataset | Dataset | Yes | Tokenized training dataset |
| deepspeed | str | No | Path to DeepSpeed ZeRO3 config JSON |
| deepspeed_zero3 | bool | No | Enable DeepSpeed ZeRO Stage 3 |
| save_checkpoint | bool | No | Enable checkpoint saving (default True) |
Outputs
| Name | Type | Description |
|---|---|---|
| trained model | PeftModel | Model with trained LoRA adapter weights |
| checkpoints | Files | Saved to output_dir (if save_checkpoint=True) |
Usage Examples
import transformers
# Standard LoRA training (single GPU)
trainer = transformers.Trainer(
model=model,
train_dataset=train_data,
eval_dataset=val_data,
args=transformers.TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=64,
num_train_epochs=3,
learning_rate=3e-5,
bf16=True,
ddp_backend="ccl",
output_dir="./lora-output",
),
data_collator=transformers.DataCollatorForSeq2Seq(
tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
),
)
# With DeepSpeed ZeRO3 for multi-GPU
trainer_ds = transformers.Trainer(
model=model,
train_dataset=train_data,
args=transformers.TrainingArguments(
per_device_train_batch_size=2,
bf16=True,
ddp_backend="ccl",
deepspeed="./deepspeed_zero3_config.json",
output_dir="./lora-output-ds",
),
data_collator=transformers.DataCollatorForSeq2Seq(
tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
),
)
trainer.train()
model.save_pretrained("./lora-output")