Environment:Intel Ipex llm XPU Finetuning Environment

Knowledge Sources	IPEX-LLM Intel OneAPI
Domains	Infrastructure, LLM_Finetuning
Last Updated	2026-02-09 12:00 GMT

Overview

Ubuntu 22.04 environment with Intel XPU (Arc/Flex/Max GPU), PyTorch 2.1+, IPEX-LLM, and HuggingFace ecosystem for QLoRA, LoRA, and DPO finetuning workflows.

Description

This environment provides an Intel XPU-accelerated context for LLM finetuning. It is built on the Intel OneAPI base toolkit and requires Intel GPU drivers (Arc, Flex, or Data Center Max series). The stack includes IPEX-LLM as the core acceleration library, with ipex_llm.transformers providing drop-in replacements for HuggingFace AutoModelForCausalLM. The environment supports 4-bit NF4 quantization (QLoRA), bf16 full-precision (LoRA), and DPO training modes. Distributed multi-GPU training uses Intel OneCCL as the communication backend rather than NVIDIA's NCCL.

Usage

Use this environment for any QLoRA Finetuning, LoRA Finetuning, or DPO Finetuning workflow that requires Intel XPU acceleration. It is the mandatory prerequisite for running the IPEX-LLM compatible Trainer implementations including QLoRA with BitsAndBytesConfig, bf16 LoRA with DeepSpeed ZeRO3, and DPO with TRL's DPOTrainer.

System Requirements

Category	Requirement	Notes
OS	Ubuntu 22.04 LTS	Intel OneAPI base toolkit required
Hardware	Intel GPU (Arc/Flex/Max)	XPU device; iGPU also supported for smaller models
GPU Driver	Intel GPU drivers	Level Zero runtime required
Distributed	Intel OneCCL	Required for multi-GPU DDP training (replaces NCCL)

Dependencies

System Packages

Intel OneAPI Base Toolkit 2024.0.1+
`intel-opencl-icd`
`intel-level-zero-gpu`
`level-zero`, `level-zero-dev`

Python Packages

`ipex-llm[xpu]` (pre-release)
`torch` == 2.1.0a0 (XPU 2.1) or == 2.6.0+xpu (XPU 2.6)
`intel_extension_for_pytorch` == 2.1.10+xpu or == 2.6.10+xpu
`transformers` == 4.36.0 (finetuning) or == 4.53.2 (serving)
`peft` == 0.10.0
`bitsandbytes`
`accelerate` == 0.23.0
`datasets`
`scipy`
`fire`
`trl` >= 0.7.9, <= 0.9.6 (for DPO)
`deepspeed` >= 0.13.1 (for distributed LoRA)
`oneccl_bind_pt` (for multi-GPU DDP)

Credentials

The following environment variables must be set:

`ACCELERATE_USE_XPU`: Must be set to `"true"` before importing accelerate. Enables Intel XPU device detection in HuggingFace Accelerate.
`LOCAL_RANK`: GPU rank for distributed training. Also read from `MPI_LOCALRANKID` (Intel MPI).
`WORLD_SIZE`: Total number of GPUs. Also read from `PMI_SIZE` (Intel MPI).
`RANK`: Process rank for DDP.
`MASTER_PORT`: Communication port (default: 29500).
`WANDB_PROJECT`: (Optional) Weights & Biases project name for logging.
`WANDB_WATCH`: (Optional) W&B gradient watching mode (`false`, `gradients`, `all`).
`WANDB_LOG_MODEL`: (Optional) W&B model logging (`false`, `true`).
`SYCL_CACHE_PERSISTENT`: Set to `1` for persistent SYCL compilation cache (faster startup).

Quick Install

# Source Intel OneAPI environment
source /opt/intel/oneapi/setvars.sh

# Set XPU for Accelerate (must be before import)
export ACCELERATE_USE_XPU=true

# Install IPEX-LLM with XPU support
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

# Install finetuning dependencies
pip install transformers==4.36.0 peft==0.10.0 datasets bitsandbytes scipy fire accelerate==0.23.0

# For DPO training
pip install trl>=0.7.9

# For distributed multi-GPU training
pip install oneccl_bind_pt -f https://developer.intel.com/ipex-whl-stable
pip install deepspeed>=0.13.1

Code Evidence

Environment variable setup from `alpaca_qlora_finetuning.py:35`:

os.environ["ACCELERATE_USE_XPU"] = "true"

Distributed environment detection from `alpaca_qlora_finetuning.py:61-67`:

local_rank = get_int_from_env(["LOCAL_RANK","MPI_LOCALRANKID"], "0")
world_size = get_int_from_env(["WORLD_SIZE","PMI_SIZE"], "1")
port = get_int_from_env(["MASTER_PORT"], 29500)
os.environ["LOCAL_RANK"] = str(local_rank)
os.environ["WORLD_SIZE"] = str(world_size)
os.environ["RANK"] = str(local_rank)
os.environ["MASTER_PORT"] = str(port)

XPU device placement from `alpaca_qlora_finetuning.py:199`:

model = model.to(f'xpu:{os.environ.get("LOCAL_RANK", 0)}')

CCL DDP backend from `alpaca_qlora_finetuning.py:268`:

ddp_backend="ccl",

W&B environment check from `common/utils/util.py:63-75`:

def wandb_check(wandb_project, wandb_watch, wandb_log_model):
    use_wandb = len(wandb_project) > 0 or (
        "WANDB_PROJECT" in os.environ and len(os.environ["WANDB_PROJECT"]) > 0
    )
    if len(wandb_project) > 0:
        os.environ["WANDB_PROJECT"] = wandb_project
    if len(wandb_watch) > 0:
        os.environ["WANDB_WATCH"] = wandb_watch
    if len(wandb_log_model) > 0:
        os.environ["WANDB_LOG_MODEL"] = wandb_log_model
    return use_wandb

Common Errors

Error Message	Cause	Solution
`ACCELERATE_USE_XPU not set`	XPU environment variable not configured	Set `export ACCELERATE_USE_XPU=true` before importing accelerate
`RuntimeError: No XPU device found`	Intel GPU drivers not installed	Install Intel GPU drivers and Level Zero runtime
`ModuleNotFoundError: No module named 'oneccl_bindings_for_pytorch'`	OneCCL not installed	`pip install oneccl_bind_pt` from Intel index
`DDP backend 'ccl' not available`	OneCCL environment not sourced	`source /opt/intel/oneapi/ccl/latest/env/vars.sh --force`
`paged_adamw_8bit is not supported yet`	Paged AdamW not available on Intel platform	Use `optim="adamw_torch"` or `optim="adamw_hf"` instead

Compatibility Notes

Intel XPU Only: This environment targets Intel Arc, Flex, and Data Center Max GPUs. NVIDIA CUDA GPUs are not supported.
CCL vs NCCL: Multi-GPU training uses Intel OneCCL (`ddp_backend="ccl"`) instead of NVIDIA NCCL. The OneAPI CCL environment must be sourced before training.
DeepSpeed ZeRO3: Requires IPEX-LLM compatibility patches for `_constant_buffered_norm2`. Applied automatically when `deepspeed` config contains `"zero3"`.
SafeTensors: Checkpoint saving uses `save_safetensors=False` for compatibility.
PyTorch Version: Two XPU variants exist: PyTorch 2.1 (legacy) and PyTorch 2.6 (recommended). Package versions must match exactly.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment