Environment:Deepspeedai DeepSpeed Multi Accelerator Environment

Knowledge Sources	DeepSpeed
Domains	Infrastructure, Deep_Learning, Hardware_Abstraction
Last Updated	2026-02-09 00:00 GMT

Overview

Multi-accelerator support environment covering Intel XPU, Huawei NPU, Habana HPU, Cambricon MLU, Apple MPS, Tecorigin SDAA, and CPU backends.

Description

DeepSpeed provides a hardware abstraction layer through its accelerator framework (`accelerator/`), enabling the same training and inference code to run across diverse hardware backends. Each accelerator backend implements the `DeepSpeedAccelerator` abstract interface. The system auto-detects the available accelerator in a priority-ordered cascade, or the user can force a specific backend via the `DS_ACCELERATOR` environment variable. Supported backends: `cuda`, `cpu`, `xpu`, `xpu.external`, `npu`, `mps`, `hpu`, `mlu`, `sdaa`.

Usage

Use this environment when running DeepSpeed on non-NVIDIA hardware. Each backend has specific Python package requirements that must be installed before DeepSpeed can detect and use the accelerator. The detection order is: xpu.external > xpu (IPEX) > xpu (PyTorch native) > npu > sdaa > mps > hpu > mlu > cuda > cpu (fallback).

System Requirements

Category	Requirement	Notes
Intel XPU	`intel_extension_for_pytorch` (IPEX) with XPU support	Or `intel_extension_for_deepspeed` for external XPU path; PyTorch >= 2.8 supports native `torch.xpu`
Huawei NPU	`torch_npu` package	Ascend CANN toolkit required; `ASCEND_HOME_PATH` env var
Habana HPU	`habana_frameworks.torch.hpu`	Habana SynapseAI software stack required
Cambricon MLU	`torch_mlu` package	Cambricon Neuware SDK required
Apple MPS	PyTorch with MPS support	macOS with Apple Silicon; limited functionality
Tecorigin SDAA	`torch_sdaa` package	Tecorigin SDAA hardware and drivers required
CPU	No special hardware	Fallback when no accelerator detected

Dependencies

Intel XPU

`intel_extension_for_pytorch` (IPEX) with XPU support, OR
`intel_extension_for_deepspeed` (external XPU path)

Huawei NPU

`torch_npu`
Ascend CANN toolkit (version detected from `ascend_*_install.info`)

Habana HPU

`habana_frameworks` (SynapseAI)

Cambricon MLU

`torch_mlu`

Tecorigin SDAA

`torch_sdaa`

Credentials

`DS_ACCELERATOR`: Override accelerator auto-detection. Values: `cuda`, `cpu`, `xpu`, `xpu.external`, `npu`, `mps`, `hpu`, `mlu`, `sdaa`.
`ASCEND_HOME_PATH`: Path to Ascend CANN installation (NPU only).

Quick Install

# Force a specific accelerator
DS_ACCELERATOR=xpu pip install deepspeed

# For Intel XPU (via IPEX)
pip install intel_extension_for_pytorch
pip install deepspeed

# For Huawei NPU
pip install torch_npu
pip install deepspeed

# Verify detected accelerator
python -c "from deepspeed.accelerator import get_accelerator; print(get_accelerator().device_name())"

Code Evidence

Supported accelerator list from `accelerator/real_accelerator.py:23`:

SUPPORTED_ACCELERATOR_LIST = ['cuda', 'cpu', 'xpu', 'xpu.external', 'npu', 'mps', 'hpu', 'mlu', 'sdaa']

DS_ACCELERATOR override from `accelerator/real_accelerator.py:59-111`:

if "DS_ACCELERATOR" in os.environ.keys():
    accelerator_name = os.environ["DS_ACCELERATOR"]
    if accelerator_name == "xpu":
        import intel_extension_for_pytorch as ipex
        assert ipex._C._has_xpu()
    elif accelerator_name == "npu":
        import torch_npu
    elif accelerator_name not in SUPPORTED_ACCELERATOR_LIST:
        raise ValueError(f'DS_ACCELERATOR must be one of {SUPPORTED_ACCELERATOR_LIST}.')

Auto-detection cascade from `accelerator/real_accelerator.py:114-213`:

# Detection order:
# 1. intel_extension_for_deepspeed (xpu.external)
# 2. intel_extension_for_pytorch (xpu via IPEX)
# 3. torch.xpu (native PyTorch >= 2.8, when no CUDA devices)
# 4. torch_npu (npu)
# 5. torch_sdaa (sdaa)
# 6. torch.mps (mps)
# 7. habana_frameworks.torch.hpu (hpu)
# 8. torch_mlu (mlu)
# 9. torch.cuda (cuda)
# 10. cpu (fallback)

XPU with native PyTorch >= 2.8 from `accelerator/real_accelerator.py:139-153`:

# torch.xpu will be supported in upstream pytorch-2.8.
# Currently we can run on xpu device only using pytorch,
# also reserve the old path using ipex when the torch version is old.
if hasattr(torch, 'xpu'):
    if torch.cuda.device_count() == 0:  #ignore-cuda
        if torch.xpu.device_count() > 0 and torch.xpu.is_available():
            accelerator_name = "xpu"

Common Errors

Error Message	Cause	Solution
`XPU_Accelerator requires intel_extension_for_pytorch`	IPEX not installed	`pip install intel_extension_for_pytorch`
`NPU_Accelerator requires torch_npu`	torch_npu not installed	`pip install torch_npu`
`HPU_Accelerator requires habana_frameworks.torch.hpu`	SynapseAI not installed	Install Habana SynapseAI software stack
`MLU_Accelerator requires torch_mlu`	torch_mlu not installed	`pip install torch_mlu`
`SDAA_Accelerator requires torch_sdaa`	torch_sdaa not installed	`pip install torch_sdaa`
`MPS_Accelerator requires torch.mps`	MPS not available	Requires macOS with Apple Silicon and compatible PyTorch
`Setting accelerator to CPU` (warning)	No accelerator detected	Install appropriate hardware extension package

Compatibility Notes

Intel XPU: Three paths exist: external (`intel_extension_for_deepspeed`), IPEX (`intel_extension_for_pytorch`), and native PyTorch >= 2.8. XPU is only auto-detected via native torch when no CUDA devices are present.
Triton on ROCm: Triton import is explicitly skipped on AMD ROCm due to `pytorch-triton-rocm` module breaking the device API in DeepSpeed.
Apple MPS: Detection uses `torch.mps.current_allocated_memory()` as a proxy since `torch.mps.is_available()` may not exist.
CPU Fallback: When no accelerator is detected, DeepSpeed falls back to CPU mode with a warning. This is suitable for testing but not for production training.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment