Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Intel Ipex llm Transformers Trainer LoRA

From Leeroopedia


Knowledge Sources
Domains NLP, Training
Last Updated 2026-02-09 00:00 GMT

Overview

HuggingFace Trainer configured for standard LoRA fine-tuning on Intel XPU with optional DeepSpeed ZeRO Stage 3.

Description

This is a Wrapper Doc for the HuggingFace transformers.Trainer in the context of standard LoRA fine-tuning. The key difference from the QLoRA variant is support for DeepSpeed ZeRO Stage 3, which partitions model parameters, gradients, and optimizer states across GPUs. This enables training bf16 models that exceed single-GPU memory. Configuration uses deepspeed_zero3=True with a ZeRO3 config JSON.

External Reference

Usage

Use after preparing the bf16 PeftModel and tokenized datasets. Enable DeepSpeed ZeRO3 when the model exceeds single-GPU memory.

Code Reference

Source Location

  • Repository: IPEX-LLM
  • File: python/llm/example/GPU/LLM-Finetuning/LoRA/alpaca_lora_finetuning.py
  • Lines: 227-265

Signature

trainer = transformers.Trainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=val_data,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=micro_batch_size,
        gradient_accumulation_steps=gradient_accumulation_steps,
        max_grad_norm=0.3,
        num_train_epochs=num_epochs,
        learning_rate=learning_rate,
        lr_scheduler_type="cosine",
        bf16=True,
        optim="adamw_torch",
        ddp_backend="ccl",
        deepspeed=deepspeed,  # Optional ZeRO3 config path
        save_safetensors=False,
    ),
    data_collator=transformers.DataCollatorForSeq2Seq(
        tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
    ),
)
trainer.train(resume_from_checkpoint=resume_from_checkpoint)

Import

import transformers

I/O Contract

Inputs

Name Type Required Description
model PeftModel Yes bf16 model with LoRA adapters
train_dataset Dataset Yes Tokenized training dataset
deepspeed str No Path to DeepSpeed ZeRO3 config JSON
deepspeed_zero3 bool No Enable DeepSpeed ZeRO Stage 3
save_checkpoint bool No Enable checkpoint saving (default True)

Outputs

Name Type Description
trained model PeftModel Model with trained LoRA adapter weights
checkpoints Files Saved to output_dir (if save_checkpoint=True)

Usage Examples

import transformers

# Standard LoRA training (single GPU)
trainer = transformers.Trainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=val_data,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=64,
        num_train_epochs=3,
        learning_rate=3e-5,
        bf16=True,
        ddp_backend="ccl",
        output_dir="./lora-output",
    ),
    data_collator=transformers.DataCollatorForSeq2Seq(
        tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
    ),
)

# With DeepSpeed ZeRO3 for multi-GPU
trainer_ds = transformers.Trainer(
    model=model,
    train_dataset=train_data,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=2,
        bf16=True,
        ddp_backend="ccl",
        deepspeed="./deepspeed_zero3_config.json",
        output_dir="./lora-output-ds",
    ),
    data_collator=transformers.DataCollatorForSeq2Seq(
        tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
    ),
)

trainer.train()
model.save_pretrained("./lora-output")

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment