Environment:Bitsandbytes foundation Bitsandbytes XPU SYCL Runtime
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, XPU_Backend, SYCL |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Intel XPU SYCL runtime environment for running bitsandbytes quantization operations on Intel discrete GPUs using oneAPI/SYCL.
Description
This environment provides the Intel XPU GPU-accelerated context for running bitsandbytes operations on Intel discrete and integrated GPUs. It uses a three-tier dispatch strategy: (1) SYCL native kernels compiled via CMake with -DCOMPUTE_BACKEND=xpu for dequantize_4bit, dequantize_blockwise, and gemv_4bit; (2) Triton kernels for quantize_blockwise, quantize_4bit, and optimizer operations; (3) PyTorch fallback when neither SYCL nor Triton is available. The XPU backend requires Intel oneAPI toolkit 2025.1.3 and is detected via torch._C._has_xpu.
Usage
Use this environment for quantization and dequantization on Intel GPUs including 4-bit inference, blockwise dequantization, and 4-bit GEMV. The backend is automatically detected when an Intel XPU device is available in PyTorch.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Hardware | Intel discrete GPU (Arc, Data Center Max, Flex) | Intel XPU-capable device |
| OS | Linux x86-64 (glibc >= 2.34), Windows x86-64 | Ubuntu 22.04+ recommended |
| oneAPI Toolkit | 2025.1.3 | Intel Deep Learning Essentials |
| Python | >= 3.10 | From pyproject.toml |
| PyTorch | >= 2.3, < 3 (>= 2.9 for int8_linear_matmul) | torch._C._has_xpu must be True |
Dependencies
System Packages
- Intel oneAPI Base Toolkit 2025.1.3
- SYCL runtime (libsycl.so)
- Intel GPU drivers
Python Packages
- `torch` >= 2.3, < 3
- `intel_extension_for_pytorch` (optional, for extended XPU support)
- `numpy` >= 1.17
- `packaging` >= 20.9
Build Requirements (for SYCL kernels)
- CMake with -DCOMPUTE_BACKEND=xpu
- Docker image: intel/deep-learning-essentials:2025.1.3-0-devel-ubuntu22.04 (Linux)
- Windows: Intel Deep Learning Essentials 2025.1.3 + Intel oneAPI setvars.bat
Credentials
No secrets or credentials required. The XPU backend is detected via PyTorch's built-in XPU device support.
Quick Install
# Install with Intel XPU support
pip install bitsandbytes
# Or build from source with SYCL kernels:
cmake -DCOMPUTE_BACKEND=xpu -S . -B build
cmake --build build
pip install -e .
# Verify XPU detection
python -c "import torch; print(torch.xpu.is_available())"
python -m bitsandbytes
Code Evidence
XPU backend detection from `bitsandbytes/cextension.py`:
elif torch._C._has_xpu:
BNB_BACKEND = "XPU"
Three-tier dispatch strategy from `bitsandbytes/backends/xpu/ops.py`:
# Tier 1: SYCL native library (preferred)
@register_kernel("bitsandbytes::dequantize_4bit", "xpu")
def _(A, absmax, blocksize, quant_type, shape, dtype):
# Uses compiled SYCL kernels via native library
# Tier 2: Triton kernels (fallback)
# Used for quantize_blockwise, quantize_4bit, optimizer ops
# Tier 3: PyTorch default (final fallback)
# Logged warning when neither SYCL nor Triton available
SYCL kernel headers from `csrc/xpu_kernels.h`:
#include <sycl/sycl.hpp>
// SYCL_EXTERNAL template kernel definitions
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `torch._C._has_xpu` is False | Intel XPU support not available in PyTorch | Install PyTorch with XPU support or intel_extension_for_pytorch |
| SYCL native library not found | Built without -DCOMPUTE_BACKEND=xpu | Rebuild with CMake XPU backend flag or use Triton fallback |
| int8_linear_matmul not available | PyTorch < 2.9 | Upgrade PyTorch to 2.9+ for torch._int_mm support on XPU |
Compatibility Notes
- Quantization types: Both NF4 and FP4 are supported.
- Data types: Supports float16, bfloat16, and float32.
- SYCL kernels: Provide best performance but require building from source with CMake.
- Triton fallback: Available for most operations when SYCL native library is not present.
- PyTorch >= 2.9: Required for INT8 linear matmul via torch._int_mm.
- glibc >= 2.34: Required on Linux (Ubuntu 22.04+).