Environment:Deepspeedai DeepSpeed Multi Accelerator Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Deep_Learning, Hardware_Abstraction |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Multi-accelerator support environment covering Intel XPU, Huawei NPU, Habana HPU, Cambricon MLU, Apple MPS, Tecorigin SDAA, and CPU backends.
Description
DeepSpeed provides a hardware abstraction layer through its accelerator framework (`accelerator/`), enabling the same training and inference code to run across diverse hardware backends. Each accelerator backend implements the `DeepSpeedAccelerator` abstract interface. The system auto-detects the available accelerator in a priority-ordered cascade, or the user can force a specific backend via the `DS_ACCELERATOR` environment variable. Supported backends: `cuda`, `cpu`, `xpu`, `xpu.external`, `npu`, `mps`, `hpu`, `mlu`, `sdaa`.
Usage
Use this environment when running DeepSpeed on non-NVIDIA hardware. Each backend has specific Python package requirements that must be installed before DeepSpeed can detect and use the accelerator. The detection order is: xpu.external > xpu (IPEX) > xpu (PyTorch native) > npu > sdaa > mps > hpu > mlu > cuda > cpu (fallback).
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Intel XPU | `intel_extension_for_pytorch` (IPEX) with XPU support | Or `intel_extension_for_deepspeed` for external XPU path; PyTorch >= 2.8 supports native `torch.xpu` |
| Huawei NPU | `torch_npu` package | Ascend CANN toolkit required; `ASCEND_HOME_PATH` env var |
| Habana HPU | `habana_frameworks.torch.hpu` | Habana SynapseAI software stack required |
| Cambricon MLU | `torch_mlu` package | Cambricon Neuware SDK required |
| Apple MPS | PyTorch with MPS support | macOS with Apple Silicon; limited functionality |
| Tecorigin SDAA | `torch_sdaa` package | Tecorigin SDAA hardware and drivers required |
| CPU | No special hardware | Fallback when no accelerator detected |
Dependencies
Intel XPU
- `intel_extension_for_pytorch` (IPEX) with XPU support, OR
- `intel_extension_for_deepspeed` (external XPU path)
Huawei NPU
- `torch_npu`
- Ascend CANN toolkit (version detected from `ascend_*_install.info`)
Habana HPU
- `habana_frameworks` (SynapseAI)
Cambricon MLU
- `torch_mlu`
Tecorigin SDAA
- `torch_sdaa`
Credentials
- `DS_ACCELERATOR`: Override accelerator auto-detection. Values: `cuda`, `cpu`, `xpu`, `xpu.external`, `npu`, `mps`, `hpu`, `mlu`, `sdaa`.
- `ASCEND_HOME_PATH`: Path to Ascend CANN installation (NPU only).
Quick Install
# Force a specific accelerator
DS_ACCELERATOR=xpu pip install deepspeed
# For Intel XPU (via IPEX)
pip install intel_extension_for_pytorch
pip install deepspeed
# For Huawei NPU
pip install torch_npu
pip install deepspeed
# Verify detected accelerator
python -c "from deepspeed.accelerator import get_accelerator; print(get_accelerator().device_name())"
Code Evidence
Supported accelerator list from `accelerator/real_accelerator.py:23`:
SUPPORTED_ACCELERATOR_LIST = ['cuda', 'cpu', 'xpu', 'xpu.external', 'npu', 'mps', 'hpu', 'mlu', 'sdaa']
DS_ACCELERATOR override from `accelerator/real_accelerator.py:59-111`:
if "DS_ACCELERATOR" in os.environ.keys():
accelerator_name = os.environ["DS_ACCELERATOR"]
if accelerator_name == "xpu":
import intel_extension_for_pytorch as ipex
assert ipex._C._has_xpu()
elif accelerator_name == "npu":
import torch_npu
elif accelerator_name not in SUPPORTED_ACCELERATOR_LIST:
raise ValueError(f'DS_ACCELERATOR must be one of {SUPPORTED_ACCELERATOR_LIST}.')
Auto-detection cascade from `accelerator/real_accelerator.py:114-213`:
# Detection order:
# 1. intel_extension_for_deepspeed (xpu.external)
# 2. intel_extension_for_pytorch (xpu via IPEX)
# 3. torch.xpu (native PyTorch >= 2.8, when no CUDA devices)
# 4. torch_npu (npu)
# 5. torch_sdaa (sdaa)
# 6. torch.mps (mps)
# 7. habana_frameworks.torch.hpu (hpu)
# 8. torch_mlu (mlu)
# 9. torch.cuda (cuda)
# 10. cpu (fallback)
XPU with native PyTorch >= 2.8 from `accelerator/real_accelerator.py:139-153`:
# torch.xpu will be supported in upstream pytorch-2.8.
# Currently we can run on xpu device only using pytorch,
# also reserve the old path using ipex when the torch version is old.
if hasattr(torch, 'xpu'):
if torch.cuda.device_count() == 0: #ignore-cuda
if torch.xpu.device_count() > 0 and torch.xpu.is_available():
accelerator_name = "xpu"
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `XPU_Accelerator requires intel_extension_for_pytorch` | IPEX not installed | `pip install intel_extension_for_pytorch` |
| `NPU_Accelerator requires torch_npu` | torch_npu not installed | `pip install torch_npu` |
| `HPU_Accelerator requires habana_frameworks.torch.hpu` | SynapseAI not installed | Install Habana SynapseAI software stack |
| `MLU_Accelerator requires torch_mlu` | torch_mlu not installed | `pip install torch_mlu` |
| `SDAA_Accelerator requires torch_sdaa` | torch_sdaa not installed | `pip install torch_sdaa` |
| `MPS_Accelerator requires torch.mps` | MPS not available | Requires macOS with Apple Silicon and compatible PyTorch |
| `Setting accelerator to CPU` (warning) | No accelerator detected | Install appropriate hardware extension package |
Compatibility Notes
- Intel XPU: Three paths exist: external (`intel_extension_for_deepspeed`), IPEX (`intel_extension_for_pytorch`), and native PyTorch >= 2.8. XPU is only auto-detected via native torch when no CUDA devices are present.
- Triton on ROCm: Triton import is explicitly skipped on AMD ROCm due to `pytorch-triton-rocm` module breaking the device API in DeepSpeed.
- Apple MPS: Detection uses `torch.mps.current_allocated_memory()` as a proxy since `torch.mps.is_available()` may not exist.
- CPU Fallback: When no accelerator is detected, DeepSpeed falls back to CPU mode with a warning. This is suitable for testing but not for production training.