Environment:Huggingface Peft GPU Hardware Detection
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Hardware |
| Last Updated | 2026-02-07 06:44 GMT |
Overview
Hardware acceleration detection layer supporting NVIDIA CUDA, Intel XPU, Google TPU, Huawei NPU, Cambricon MLU, and Apple MPS backends.
Description
PEFT uses a multi-backend device detection system to automatically select the appropriate hardware accelerator. The `infer_device()` function in `src/peft/utils/other.py` probes available backends in a fixed priority order: CUDA > MPS > MLU > XPU > NPU > CPU. Individual backend checks are provided by dedicated functions in `import_utils.py` (TPU, XPU) and `accelerate.utils` (NPU, MLU). XPU detection explicitly excludes macOS (Darwin). TPU detection optionally verifies actual device availability via `torch_xla`.
Usage
This environment is automatically activated whenever PEFT needs to place tensors on a device. It is critical for:
- Model training: Selecting GPU for adapter training
- LoftQ initialization: Uses `"xpu" if is_xpu_available() else "cuda"` for compute device
- RandLoRA initialization: Selects `bfloat16` if BF16 hardware support available, otherwise `float16`
- LoRA variant dispatch: XPU-specific code paths in `lora/variants.py`
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| NVIDIA GPU | CUDA-capable GPU | Primary and most-tested backend |
| Intel XPU | Intel Arc / Data Center GPU | Not supported on macOS |
| Google TPU | Cloud TPU v2/v3/v4 | Requires `torch_xla` package |
| Huawei NPU | Ascend NPU | Requires `accelerate >= 0.21.0` |
| Cambricon MLU | MLU hardware | Requires `accelerate >= 0.29.0` |
| Apple MPS | Apple Silicon (M1/M2/M3) | Via `torch.backends.mps` |
Dependencies
For CUDA
- `torch` with CUDA support (standard PyTorch build)
For XPU
- `torch` with XPU support
- NOT macOS (explicitly excluded)
For TPU
- `torch_xla`
For MLU
- `accelerate` >= 0.29.0 (for `is_mlu_available`)
For BF16 Detection
- `accelerate` (for `is_bf16_available`)
Credentials
No credentials required for hardware detection.
Quick Install
# Standard CUDA setup (most common)
pip install torch # with CUDA support
# For TPU
pip install torch torch_xla
# For Intel XPU
pip install torch # Intel XPU build
pip install accelerate>=0.21.0
# MLU support needs newer accelerate
pip install accelerate>=0.29.0
Code Evidence
Device inference priority from `src/peft/utils/other.py:116-127`:
def infer_device() -> str:
if torch.cuda.is_available():
return "cuda"
elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
return "mps"
elif mlu_available:
return "mlu"
elif is_xpu_available():
return "xpu"
elif is_npu_available():
return "npu"
return "cpu"
XPU detection with Darwin exclusion from `src/peft/import_utils.py:132-148`:
@lru_cache
def is_xpu_available(check_device=False):
system = platform.system()
if system == "Darwin":
return False
else:
if check_device:
try:
_ = torch.xpu.device_count()
return torch.xpu.is_available()
except RuntimeError:
return False
return hasattr(torch, "xpu") and torch.xpu.is_available()
TPU detection from `src/peft/import_utils.py:71-84`:
@lru_cache
def is_torch_tpu_available(check_device=True):
if importlib.util.find_spec("torch_xla") is not None:
if check_device:
try:
import torch_xla.core.xla_model as xm
_ = xm.xla_device()
return True
except RuntimeError:
return False
return True
return False
BF16 hardware detection from `src/peft/tuners/randlora/model.py:57`:
dtype = torch.bfloat16 if is_bf16_available() else torch.float16
XPU-specific code path from `src/peft/tuners/lora/variants.py:149`:
if is_xpu_available():
# XPU-specific handling
...
LoftQ compute device from `src/peft/utils/loftq_utils.py:213`:
compute_device = "xpu" if is_xpu_available() else "cuda"
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `RuntimeError` from `xm.xla_device()` | TPU not available in environment | Ensure running on TPU-enabled instance with `torch_xla` installed |
| `RuntimeError` from `torch.xpu.device_count()` | XPU driver not configured | Install Intel GPU drivers and XPU-enabled PyTorch build |
| Falls back to `"cpu"` | No accelerator detected | Install CUDA drivers or run on GPU-enabled hardware |
Compatibility Notes
- macOS: XPU is explicitly excluded on Darwin. Apple Silicon uses MPS backend instead.
- MLU support: Only available with `accelerate >= 0.29.0`. Older accelerate versions silently default to other backends.
- DTensor/Distributed: Requires `torch >= 2.5.0` for distributed tensor support (checked in `tuners_utils.py:61-62`).
- BF16: Some operations default to float16 if BF16 hardware support is not detected. This affects RandLoRA and LoRA-FA initialization.