Environment:Vllm project Vllm Intel XPU
| Knowledge Sources | |
|---|---|
| Domains | GPU_Computing, Intel_XPU |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Intel XPU (GPU) runtime environment for vLLM, leveraging Intel Extension for PyTorch (IPEX) to enable LLM inference on Intel discrete GPUs (Data Center GPU Max series / Ponte Vecchio and future Arc-based accelerators).
Description
This environment defines the software stack required to run vLLM on Intel discrete GPUs using the XPU device abstraction. Intel's XPU platform uses the oneAPI programming model and the Level Zero runtime for GPU compute. The Intel Extension for PyTorch (IPEX) provides the bridge between PyTorch's device abstraction and Intel GPU hardware, implementing custom operators for attention, GEMM, and element-wise operations optimized for Intel's Xe GPU architecture. vLLM's IPEX operations module (ipex_ops) wraps IPEX-specific kernel calls for paged attention, rotary positional embeddings, layer normalization, and activation functions. The XPU backend registers as a vLLM platform via the platform detection system, enabling automatic selection when Intel GPUs are detected.
Usage
To use vLLM with Intel XPU, install the Intel GPU driver, the oneAPI Base Toolkit, and Intel Extension for PyTorch. Set VLLM_TARGET_DEVICE=xpu during vLLM installation to compile XPU-specific extensions. At runtime, vLLM auto-detects Intel GPUs through IPEX's device enumeration and routes tensor operations through the XPU backend. Multi-GPU inference on Intel hardware uses the oneCCL (oneAPI Collective Communications Library) backend for distributed communication.
Requirements
| Requirement | Value |
|---|---|
| GPU Hardware | Intel Data Center GPU Max series (Ponte Vecchio) or Intel Arc series |
| Intel GPU Driver | Latest stable Intel GPU driver for Linux |
| oneAPI Base Toolkit | 2024.0+ (includes Level Zero runtime, oneMKL, oneCCL) |
| Intel Extension for PyTorch (IPEX) | Compatible version matching PyTorch release |
| PyTorch | XPU-enabled build of PyTorch |
| Python | >= 3.10 |
| Operating System | Linux (Ubuntu 22.04+ or SUSE Linux Enterprise) |
| Level Zero Runtime | Included with oneAPI toolkit |