Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Vllm project Vllm Intel XPU

From Leeroopedia


Knowledge Sources
Domains GPU_Computing, Intel_XPU
Last Updated 2026-02-08 00:00 GMT

Overview

Intel XPU (GPU) runtime environment for vLLM, leveraging Intel Extension for PyTorch (IPEX) to enable LLM inference on Intel discrete GPUs (Data Center GPU Max series / Ponte Vecchio and future Arc-based accelerators).

Description

This environment defines the software stack required to run vLLM on Intel discrete GPUs using the XPU device abstraction. Intel's XPU platform uses the oneAPI programming model and the Level Zero runtime for GPU compute. The Intel Extension for PyTorch (IPEX) provides the bridge between PyTorch's device abstraction and Intel GPU hardware, implementing custom operators for attention, GEMM, and element-wise operations optimized for Intel's Xe GPU architecture. vLLM's IPEX operations module (ipex_ops) wraps IPEX-specific kernel calls for paged attention, rotary positional embeddings, layer normalization, and activation functions. The XPU backend registers as a vLLM platform via the platform detection system, enabling automatic selection when Intel GPUs are detected.

Usage

To use vLLM with Intel XPU, install the Intel GPU driver, the oneAPI Base Toolkit, and Intel Extension for PyTorch. Set VLLM_TARGET_DEVICE=xpu during vLLM installation to compile XPU-specific extensions. At runtime, vLLM auto-detects Intel GPUs through IPEX's device enumeration and routes tensor operations through the XPU backend. Multi-GPU inference on Intel hardware uses the oneCCL (oneAPI Collective Communications Library) backend for distributed communication.

Requirements

Requirement Value
GPU Hardware Intel Data Center GPU Max series (Ponte Vecchio) or Intel Arc series
Intel GPU Driver Latest stable Intel GPU driver for Linux
oneAPI Base Toolkit 2024.0+ (includes Level Zero runtime, oneMKL, oneCCL)
Intel Extension for PyTorch (IPEX) Compatible version matching PyTorch release
PyTorch XPU-enabled build of PyTorch
Python >= 3.10
Operating System Linux (Ubuntu 22.04+ or SUSE Linux Enterprise)
Level Zero Runtime Included with oneAPI toolkit

Semantic Links

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment