Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Intel Ipex llm Training With HF Trainer LoRA

From Leeroopedia


Knowledge Sources
Domains NLP, Training
Last Updated 2026-02-09 00:00 GMT

Overview

Methodology for executing standard LoRA fine-tuning using the HuggingFace Trainer with Intel XPU optimizations and optional DeepSpeed ZeRO Stage 3.

Description

Training execution for standard LoRA workflows uses HuggingFace Trainer with the same Intel XPU adaptations as QLoRA (CCL backend, bf16), but additionally supports DeepSpeed ZeRO Stage 3 for distributing model parameters across multiple GPUs. This is particularly useful for bf16 models that are larger than single-GPU memory. The workflow also supports save_checkpoint toggling and deepspeed_zero3 flag for easy multi-GPU configuration.

Usage

Use this after bf16 model loading and LoRA adapter injection. Configure DeepSpeed ZeRO3 when the bf16 model exceeds single-GPU memory, or use standard DDP for models that fit on one GPU.

Theoretical Basis

Same training loop as QLoRA Trainer but with DeepSpeed ZeRO Stage 3 option:

# Abstract ZeRO-3 distribution (NOT real implementation)
# ZeRO Stage 3 partitions: parameters + gradients + optimizer states
# Each GPU holds 1/N of all three, gathering on-demand for computation
# Enables training models larger than single-GPU memory

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment