Environment:Sgl project Sglang Multi Platform Accelerators
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Multi_Platform |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
SGLang supports multiple hardware accelerator platforms beyond NVIDIA CUDA: AMD ROCm (HIP), Intel XPU, Huawei Ascend NPU, Moore Threads MUSA, Habana HPU, and Intel CPU (AMX). Each platform requires its own driver stack, PyTorch variant, and platform-specific kernel library.
Description
SGLang abstracts hardware differences through a unified device detection layer in `python/sglang/srt/utils/common.py`. At startup, the runtime probes for available accelerators using platform-specific APIs (`torch.cuda`, `torch.xpu`, `torch.npu`, `torch.hpu`). Each platform has its own build configuration (`pyproject_*.toml`), attention backends, and kernel implementations. AMD ROCm uses the HIP interface through PyTorch's CUDA compatibility layer. Intel XPU requires PVC/LNL/BMG GPUs with XMX (matrix extension) support. Ascend NPU requires the CANN toolkit and `torch_npu`. Moore Threads MUSA requires `torchada`. CPU inference requires Intel AMX tile support on x86 or ARM64.
Usage
Use the appropriate platform environment when deploying SGLang on non-NVIDIA hardware. Each platform has restricted feature availability compared to CUDA; consult platform documentation for supported models and attention backends.
System Requirements
| Platform | Hardware | Driver/Toolkit | PyTorch Package | Detection Tool |
|---|---|---|---|---|
| AMD ROCm | MI250X/MI300X | ROCm 6.0+ | torch (ROCm build) | `rocm-smi` |
| Intel XPU | PVC/LNL/BMG | oneAPI | torch (XPU build) | Intel GPU tools |
| Ascend NPU | Atlas 800/910 | CANN 8.0+ | `torch_npu` | `npu-smi` |
| Moore Threads | MTT S4000 | MUSA SDK | `torchada` | `mthreads-gmi` |
| Habana HPU | Gaudi2/3 | Habana SynapseAI | `habana_frameworks` | `hl-smi` |
| Intel CPU | Xeon (AMX) | — | torch (CPU) | `SGLANG_USE_CPU_ENGINE=1` |
Dependencies
AMD ROCm
- `torch` (ROCm build from pytorch.org)
- `sgl-kernel` (ROCm build)
- `aiter` (AMD-specific kernel library, optional)
- ROCm driver and `rocm-smi`
Ascend NPU
- `torch_npu` (Ascend PyTorch adapter)
- `sgl-kernel-npu`
- CANN toolkit (`ASCEND_TOOLKIT_HOME` or `ASCEND_INSTALL_PATH`)
- `torchair` (for torch.compile on NPU)
Intel XPU
- `torch` (XPU build via Intel Extension for PyTorch)
- `sgl-kernel` (XPU build)
- F64 support required (PVC/LNL/BMG only)
Intel CPU
- `sgl-kernel` with CPU backend (`convert_weight_packed` op)
- Intel AMX tile support (`torch._C._cpu._is_amx_tile_supported()`)
- Set `SGLANG_USE_CPU_ENGINE=1`
Moore Threads MUSA
- `torchada` package
- MUSA SDK and `mthreads-gmi`
Habana HPU
- `habana_frameworks.torch.hpu`
- `hl-smi`
Credentials
- `ASCEND_TOOLKIT_HOME` or `ASCEND_INSTALL_PATH`: Path to CANN toolkit (NPU only)
- `ASCEND_NPU_PHY_ID`: Physical NPU device ID (default: -1, auto-detect)
- `SGLANG_USE_CPU_ENGINE`: Set to `1` to enable CPU backend
Quick Install
# AMD ROCm
pip install sglang --find-links https://flashinfer.ai/whl/rocm/
# Ascend NPU
pip install sglang-npu
# Intel XPU
pip install sglang-xpu
# CPU only
pip install sglang-cpu
SGLANG_USE_CPU_ENGINE=1 python -m sglang.launch_server --model meta-llama/Meta-Llama-3-8B
Code Evidence
Platform detection from `python/sglang/srt/utils/common.py:108-195`:
@lru_cache(maxsize=1)
def is_hip() -> bool:
return torch.version.hip is not None
@lru_cache(maxsize=1)
def is_hpu() -> bool:
return hasattr(torch, "hpu") and torch.hpu.is_available()
@lru_cache(maxsize=1)
def is_xpu() -> bool:
return hasattr(torch, "xpu") and torch.xpu.is_available()
@lru_cache(maxsize=1)
def is_npu() -> bool:
if not hasattr(torch, "npu"):
return False
if not torch.npu.is_available():
raise RuntimeError("torch_npu detected, but NPU device is not available or visible.")
return True
@lru_cache(maxsize=1)
def is_cpu() -> bool:
is_host_cpu_supported = is_host_cpu_x86() or is_host_cpu_arm64()
return os.getenv("SGLANG_USE_CPU_ENGINE", "0") == "1" and is_host_cpu_supported
@lru_cache(maxsize=1)
def is_musa() -> bool:
try:
import torchada
except ImportError:
return False
return hasattr(torch.version, "musa") and torch.version.musa is not None
HIP-specific FP8 handling from `python/sglang/srt/utils/common.py:113-117`:
if is_hip():
HIP_FP8_E4M3_FNUZ_MAX = 224.0
FP8_E4M3_MAX = HIP_FP8_E4M3_FNUZ_MAX
else:
FP8_E4M3_MAX = torch.finfo(torch.float8_e4m3fn).max
NPU memory query from `python/sglang/srt/utils/common.py:1782-1788`:
def get_npu_memory_capacity():
try:
import torch_npu
return torch.npu.mem_get_info()[1] // 1024 // 1024
except ImportError as e:
raise ImportError("torch_npu is required when run on npu device.")
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `torch_npu detected, but NPU device is not available or visible` | NPU driver issue | Check CANN toolkit installation and `npu-smi` |
| `torch_npu is required when run on npu device` | torch_npu not installed | `pip install torch_npu` |
| `rocm-smi not found` | ROCm drivers missing | Install ROCm 6.0+ drivers |
| `mthreads-gmi not found` | Moore Threads drivers missing | Install MUSA SDK |
| `hl-smi not found` | Habana drivers missing | Install SynapseAI runtime |
| `aiter is AMD specific kernel library` | aiter not installed on AMD | `pip install aiter` on AMD ROCm system |
| `NPU detected, but torchair package is not installed` | torchair missing for torch.compile | `pip install torchair` |
| `No accelerator (CUDA, XPU, HPU, NPU, MUSA) is available` | No supported hardware detected | Install appropriate driver and PyTorch variant |
Compatibility Notes
- AMD ROCm: Uses HIP through PyTorch's CUDA compatibility layer. FP8 uses `E4M3_FNUZ` format (max 224.0) instead of CUDA's `E4M3FN`. Attention backends: `triton`, `aiter`, `wave`.
- Intel XPU: Requires PVC/LNL/BMG GPUs with F64 support for XMX acceleration. Attention backend: `intel_xpu`.
- Ascend NPU: Requires CANN toolkit. Supports env vars for multi-stream (`SGLANG_NPU_USE_MULTI_STREAM`) and MLAPo (`SGLANG_NPU_USE_MLAPO`). Attention backend: `ascend`.
- Intel CPU: Requires Intel AMX tile support. Dimension constraints: output channels % 16 == 0, input channels % 32 == 0. Backend: `intel_amx`.
- Moore Threads MUSA: Early support. Requires `torchada` package.
- Habana HPU: Requires `habana_frameworks.torch.hpu` import.