Environment:Huggingface Peft GPU Hardware Detection

Knowledge Sources	HuggingFace PEFT import_utils.py
Domains	Infrastructure, Hardware
Last Updated	2026-02-07 06:44 GMT

Overview

Hardware acceleration detection layer supporting NVIDIA CUDA, Intel XPU, Google TPU, Huawei NPU, Cambricon MLU, and Apple MPS backends.

Description

PEFT uses a multi-backend device detection system to automatically select the appropriate hardware accelerator. The `infer_device()` function in `src/peft/utils/other.py` probes available backends in a fixed priority order: CUDA > MPS > MLU > XPU > NPU > CPU. Individual backend checks are provided by dedicated functions in `import_utils.py` (TPU, XPU) and `accelerate.utils` (NPU, MLU). XPU detection explicitly excludes macOS (Darwin). TPU detection optionally verifies actual device availability via `torch_xla`.

Usage

This environment is automatically activated whenever PEFT needs to place tensors on a device. It is critical for:

Model training: Selecting GPU for adapter training
LoftQ initialization: Uses `"xpu" if is_xpu_available() else "cuda"` for compute device
RandLoRA initialization: Selects `bfloat16` if BF16 hardware support available, otherwise `float16`
LoRA variant dispatch: XPU-specific code paths in `lora/variants.py`

System Requirements

Category	Requirement	Notes
NVIDIA GPU	CUDA-capable GPU	Primary and most-tested backend
Intel XPU	Intel Arc / Data Center GPU	Not supported on macOS
Google TPU	Cloud TPU v2/v3/v4	Requires `torch_xla` package
Huawei NPU	Ascend NPU	Requires `accelerate >= 0.21.0`
Cambricon MLU	MLU hardware	Requires `accelerate >= 0.29.0`
Apple MPS	Apple Silicon (M1/M2/M3)	Via `torch.backends.mps`

Dependencies

For CUDA

`torch` with CUDA support (standard PyTorch build)

For XPU

`torch` with XPU support
NOT macOS (explicitly excluded)

For TPU

`torch_xla`

For MLU

`accelerate` >= 0.29.0 (for `is_mlu_available`)

For BF16 Detection

`accelerate` (for `is_bf16_available`)

Credentials

No credentials required for hardware detection.

Quick Install

# Standard CUDA setup (most common)
pip install torch  # with CUDA support

# For TPU
pip install torch torch_xla

# For Intel XPU
pip install torch  # Intel XPU build
pip install accelerate>=0.21.0

# MLU support needs newer accelerate
pip install accelerate>=0.29.0

Code Evidence

Device inference priority from `src/peft/utils/other.py:116-127`:

def infer_device() -> str:
    if torch.cuda.is_available():
        return "cuda"
    elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
        return "mps"
    elif mlu_available:
        return "mlu"
    elif is_xpu_available():
        return "xpu"
    elif is_npu_available():
        return "npu"
    return "cpu"

XPU detection with Darwin exclusion from `src/peft/import_utils.py:132-148`:

@lru_cache
def is_xpu_available(check_device=False):
    system = platform.system()
    if system == "Darwin":
        return False
    else:
        if check_device:
            try:
                _ = torch.xpu.device_count()
                return torch.xpu.is_available()
            except RuntimeError:
                return False
        return hasattr(torch, "xpu") and torch.xpu.is_available()

TPU detection from `src/peft/import_utils.py:71-84`:

@lru_cache
def is_torch_tpu_available(check_device=True):
    if importlib.util.find_spec("torch_xla") is not None:
        if check_device:
            try:
                import torch_xla.core.xla_model as xm
                _ = xm.xla_device()
                return True
            except RuntimeError:
                return False
        return True
    return False

BF16 hardware detection from `src/peft/tuners/randlora/model.py:57`:

dtype = torch.bfloat16 if is_bf16_available() else torch.float16

XPU-specific code path from `src/peft/tuners/lora/variants.py:149`:

if is_xpu_available():
    # XPU-specific handling
    ...

LoftQ compute device from `src/peft/utils/loftq_utils.py:213`:

compute_device = "xpu" if is_xpu_available() else "cuda"

Common Errors

Error Message	Cause	Solution
`RuntimeError` from `xm.xla_device()`	TPU not available in environment	Ensure running on TPU-enabled instance with `torch_xla` installed
`RuntimeError` from `torch.xpu.device_count()`	XPU driver not configured	Install Intel GPU drivers and XPU-enabled PyTorch build
Falls back to `"cpu"`	No accelerator detected	Install CUDA drivers or run on GPU-enabled hardware

Compatibility Notes

macOS: XPU is explicitly excluded on Darwin. Apple Silicon uses MPS backend instead.
MLU support: Only available with `accelerate >= 0.29.0`. Older accelerate versions silently default to other backends.
DTensor/Distributed: Requires `torch >= 2.5.0` for distributed tensor support (checked in `tuners_utils.py:61-62`).
BF16: Some operations default to float16 if BF16 hardware support is not detected. This affects RandLoRA and LoRA-FA initialization.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment