Principle:Intel Ipex llm Training With HF Trainer LoRA

Knowledge Sources	HuggingFace Trainer IPEX-LLM
Domains	NLP, Training
Last Updated	2026-02-09 00:00 GMT

Overview

Methodology for executing standard LoRA fine-tuning using the HuggingFace Trainer with Intel XPU optimizations and optional DeepSpeed ZeRO Stage 3.

Description

Training execution for standard LoRA workflows uses HuggingFace Trainer with the same Intel XPU adaptations as QLoRA (CCL backend, bf16), but additionally supports DeepSpeed ZeRO Stage 3 for distributing model parameters across multiple GPUs. This is particularly useful for bf16 models that are larger than single-GPU memory. The workflow also supports save_checkpoint toggling and deepspeed_zero3 flag for easy multi-GPU configuration.

Usage

Use this after bf16 model loading and LoRA adapter injection. Configure DeepSpeed ZeRO3 when the bf16 model exceeds single-GPU memory, or use standard DDP for models that fit on one GPU.

Theoretical Basis

Same training loop as QLoRA Trainer but with DeepSpeed ZeRO Stage 3 option:

# Abstract ZeRO-3 distribution (NOT real implementation)
# ZeRO Stage 3 partitions: parameters + gradients + optimizer states
# Each GPU holds 1/N of all three, gathering on-demand for computation
# Enables training models larger than single-GPU memory

Related Pages

Implemented By

Implementation:Intel_Ipex_llm_Transformers_Trainer_LoRA

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment