Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Huggingface Peft GPU Hardware Detection

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Hardware
Last Updated 2026-02-07 06:44 GMT

Overview

Hardware acceleration detection layer supporting NVIDIA CUDA, Intel XPU, Google TPU, Huawei NPU, Cambricon MLU, and Apple MPS backends.

Description

PEFT uses a multi-backend device detection system to automatically select the appropriate hardware accelerator. The `infer_device()` function in `src/peft/utils/other.py` probes available backends in a fixed priority order: CUDA > MPS > MLU > XPU > NPU > CPU. Individual backend checks are provided by dedicated functions in `import_utils.py` (TPU, XPU) and `accelerate.utils` (NPU, MLU). XPU detection explicitly excludes macOS (Darwin). TPU detection optionally verifies actual device availability via `torch_xla`.

Usage

This environment is automatically activated whenever PEFT needs to place tensors on a device. It is critical for:

  • Model training: Selecting GPU for adapter training
  • LoftQ initialization: Uses `"xpu" if is_xpu_available() else "cuda"` for compute device
  • RandLoRA initialization: Selects `bfloat16` if BF16 hardware support available, otherwise `float16`
  • LoRA variant dispatch: XPU-specific code paths in `lora/variants.py`

System Requirements

Category Requirement Notes
NVIDIA GPU CUDA-capable GPU Primary and most-tested backend
Intel XPU Intel Arc / Data Center GPU Not supported on macOS
Google TPU Cloud TPU v2/v3/v4 Requires `torch_xla` package
Huawei NPU Ascend NPU Requires `accelerate >= 0.21.0`
Cambricon MLU MLU hardware Requires `accelerate >= 0.29.0`
Apple MPS Apple Silicon (M1/M2/M3) Via `torch.backends.mps`

Dependencies

For CUDA

  • `torch` with CUDA support (standard PyTorch build)

For XPU

  • `torch` with XPU support
  • NOT macOS (explicitly excluded)

For TPU

  • `torch_xla`

For MLU

  • `accelerate` >= 0.29.0 (for `is_mlu_available`)

For BF16 Detection

  • `accelerate` (for `is_bf16_available`)

Credentials

No credentials required for hardware detection.

Quick Install

# Standard CUDA setup (most common)
pip install torch  # with CUDA support

# For TPU
pip install torch torch_xla

# For Intel XPU
pip install torch  # Intel XPU build
pip install accelerate>=0.21.0

# MLU support needs newer accelerate
pip install accelerate>=0.29.0

Code Evidence

Device inference priority from `src/peft/utils/other.py:116-127`:

def infer_device() -> str:
    if torch.cuda.is_available():
        return "cuda"
    elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
        return "mps"
    elif mlu_available:
        return "mlu"
    elif is_xpu_available():
        return "xpu"
    elif is_npu_available():
        return "npu"
    return "cpu"

XPU detection with Darwin exclusion from `src/peft/import_utils.py:132-148`:

@lru_cache
def is_xpu_available(check_device=False):
    system = platform.system()
    if system == "Darwin":
        return False
    else:
        if check_device:
            try:
                _ = torch.xpu.device_count()
                return torch.xpu.is_available()
            except RuntimeError:
                return False
        return hasattr(torch, "xpu") and torch.xpu.is_available()

TPU detection from `src/peft/import_utils.py:71-84`:

@lru_cache
def is_torch_tpu_available(check_device=True):
    if importlib.util.find_spec("torch_xla") is not None:
        if check_device:
            try:
                import torch_xla.core.xla_model as xm
                _ = xm.xla_device()
                return True
            except RuntimeError:
                return False
        return True
    return False

BF16 hardware detection from `src/peft/tuners/randlora/model.py:57`:

dtype = torch.bfloat16 if is_bf16_available() else torch.float16

XPU-specific code path from `src/peft/tuners/lora/variants.py:149`:

if is_xpu_available():
    # XPU-specific handling
    ...

LoftQ compute device from `src/peft/utils/loftq_utils.py:213`:

compute_device = "xpu" if is_xpu_available() else "cuda"

Common Errors

Error Message Cause Solution
`RuntimeError` from `xm.xla_device()` TPU not available in environment Ensure running on TPU-enabled instance with `torch_xla` installed
`RuntimeError` from `torch.xpu.device_count()` XPU driver not configured Install Intel GPU drivers and XPU-enabled PyTorch build
Falls back to `"cpu"` No accelerator detected Install CUDA drivers or run on GPU-enabled hardware

Compatibility Notes

  • macOS: XPU is explicitly excluded on Darwin. Apple Silicon uses MPS backend instead.
  • MLU support: Only available with `accelerate >= 0.29.0`. Older accelerate versions silently default to other backends.
  • DTensor/Distributed: Requires `torch >= 2.5.0` for distributed tensor support (checked in `tuners_utils.py:61-62`).
  • BF16: Some operations default to float16 if BF16 hardware support is not detected. This affects RandLoRA and LoRA-FA initialization.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment