Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Intel Ipex llm Transformers Trainer QLoRA

From Leeroopedia


Knowledge Sources
Domains NLP, Training
Last Updated 2026-02-09 00:00 GMT

Overview

HuggingFace Trainer configured for QLoRA fine-tuning on Intel XPU with CCL distributed backend.

Description

This is a Wrapper Doc for the standard HuggingFace transformers.Trainer used in the context of QLoRA fine-tuning on Intel XPU. The key IPEX-LLM-specific configuration includes: ddp_backend="ccl" for Intel oneCCL communication, bf16=True for bfloat16 training stability, and optional DeepSpeed ZeRO integration for multi-GPU scaling. The DataCollatorForSeq2Seq handles padding with pad_to_multiple_of=8 for XPU efficiency.

External Reference

Usage

Use after preparing the PeftModel and tokenized datasets. The Trainer manages the full training loop including gradient accumulation, checkpointing, and evaluation.

Code Reference

Source Location

  • Repository: IPEX-LLM
  • File: python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/alpaca_qlora_finetuning.py
  • Lines: 240-278

Signature

trainer = transformers.Trainer(
    model=model,                    # PeftModel with LoRA adapters
    train_dataset=train_data,       # Tokenized training dataset
    eval_dataset=val_data,          # Tokenized validation dataset
    args=transformers.TrainingArguments(...),
    data_collator=transformers.DataCollatorForSeq2Seq(
        tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
    ),
)
trainer.train(resume_from_checkpoint=resume_from_checkpoint)

Import

import transformers
from transformers import TrainingArguments, DataCollatorForSeq2Seq

I/O Contract

Inputs

Name Type Required Description
model PeftModel Yes Model with LoRA adapters from get_peft_model
train_dataset Dataset Yes Tokenized training dataset
eval_dataset Dataset No Tokenized validation dataset
per_device_train_batch_size int No Micro batch size (default 2)
gradient_accumulation_steps int No Steps before optimizer update
num_train_epochs int No Number of training epochs (default 3)
learning_rate float No Learning rate (default 3e-5)
ddp_backend str Yes Must be "ccl" for Intel XPU
deepspeed str No Path to DeepSpeed config JSON

Outputs

Name Type Description
trained model PeftModel Model with trained LoRA adapter weights
checkpoints Files Saved to output_dir every save_steps
logs Dict Training metrics (loss, learning rate, etc.)

Usage Examples

import transformers

trainer = transformers.Trainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=val_data,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=64,
        max_grad_norm=0.3,
        num_train_epochs=3,
        learning_rate=3e-5,
        lr_scheduler_type="cosine",
        bf16=True,
        logging_steps=1,
        optim="adamw_torch",
        evaluation_strategy="steps",
        save_strategy="steps",
        eval_steps=100,
        save_steps=100,
        output_dir="./qlora-output",
        ddp_backend="ccl",
        save_safetensors=False,
    ),
    data_collator=transformers.DataCollatorForSeq2Seq(
        tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
    ),
)

# Run training
trainer.train()

# Save adapter
model.save_pretrained("./qlora-output")

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment