Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Sgl project Sglang Multi Platform Accelerators

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Multi_Platform
Last Updated 2026-02-10 00:00 GMT

Overview

SGLang supports multiple hardware accelerator platforms beyond NVIDIA CUDA: AMD ROCm (HIP), Intel XPU, Huawei Ascend NPU, Moore Threads MUSA, Habana HPU, and Intel CPU (AMX). Each platform requires its own driver stack, PyTorch variant, and platform-specific kernel library.

Description

SGLang abstracts hardware differences through a unified device detection layer in `python/sglang/srt/utils/common.py`. At startup, the runtime probes for available accelerators using platform-specific APIs (`torch.cuda`, `torch.xpu`, `torch.npu`, `torch.hpu`). Each platform has its own build configuration (`pyproject_*.toml`), attention backends, and kernel implementations. AMD ROCm uses the HIP interface through PyTorch's CUDA compatibility layer. Intel XPU requires PVC/LNL/BMG GPUs with XMX (matrix extension) support. Ascend NPU requires the CANN toolkit and `torch_npu`. Moore Threads MUSA requires `torchada`. CPU inference requires Intel AMX tile support on x86 or ARM64.

Usage

Use the appropriate platform environment when deploying SGLang on non-NVIDIA hardware. Each platform has restricted feature availability compared to CUDA; consult platform documentation for supported models and attention backends.

System Requirements

Platform Hardware Driver/Toolkit PyTorch Package Detection Tool
AMD ROCm MI250X/MI300X ROCm 6.0+ torch (ROCm build) `rocm-smi`
Intel XPU PVC/LNL/BMG oneAPI torch (XPU build) Intel GPU tools
Ascend NPU Atlas 800/910 CANN 8.0+ `torch_npu` `npu-smi`
Moore Threads MTT S4000 MUSA SDK `torchada` `mthreads-gmi`
Habana HPU Gaudi2/3 Habana SynapseAI `habana_frameworks` `hl-smi`
Intel CPU Xeon (AMX) torch (CPU) `SGLANG_USE_CPU_ENGINE=1`

Dependencies

AMD ROCm

  • `torch` (ROCm build from pytorch.org)
  • `sgl-kernel` (ROCm build)
  • `aiter` (AMD-specific kernel library, optional)
  • ROCm driver and `rocm-smi`

Ascend NPU

  • `torch_npu` (Ascend PyTorch adapter)
  • `sgl-kernel-npu`
  • CANN toolkit (`ASCEND_TOOLKIT_HOME` or `ASCEND_INSTALL_PATH`)
  • `torchair` (for torch.compile on NPU)

Intel XPU

  • `torch` (XPU build via Intel Extension for PyTorch)
  • `sgl-kernel` (XPU build)
  • F64 support required (PVC/LNL/BMG only)

Intel CPU

  • `sgl-kernel` with CPU backend (`convert_weight_packed` op)
  • Intel AMX tile support (`torch._C._cpu._is_amx_tile_supported()`)
  • Set `SGLANG_USE_CPU_ENGINE=1`

Moore Threads MUSA

  • `torchada` package
  • MUSA SDK and `mthreads-gmi`

Habana HPU

  • `habana_frameworks.torch.hpu`
  • `hl-smi`

Credentials

  • `ASCEND_TOOLKIT_HOME` or `ASCEND_INSTALL_PATH`: Path to CANN toolkit (NPU only)
  • `ASCEND_NPU_PHY_ID`: Physical NPU device ID (default: -1, auto-detect)
  • `SGLANG_USE_CPU_ENGINE`: Set to `1` to enable CPU backend

Quick Install

# AMD ROCm
pip install sglang --find-links https://flashinfer.ai/whl/rocm/

# Ascend NPU
pip install sglang-npu

# Intel XPU
pip install sglang-xpu

# CPU only
pip install sglang-cpu
SGLANG_USE_CPU_ENGINE=1 python -m sglang.launch_server --model meta-llama/Meta-Llama-3-8B

Code Evidence

Platform detection from `python/sglang/srt/utils/common.py:108-195`:

@lru_cache(maxsize=1)
def is_hip() -> bool:
    return torch.version.hip is not None

@lru_cache(maxsize=1)
def is_hpu() -> bool:
    return hasattr(torch, "hpu") and torch.hpu.is_available()

@lru_cache(maxsize=1)
def is_xpu() -> bool:
    return hasattr(torch, "xpu") and torch.xpu.is_available()

@lru_cache(maxsize=1)
def is_npu() -> bool:
    if not hasattr(torch, "npu"):
        return False
    if not torch.npu.is_available():
        raise RuntimeError("torch_npu detected, but NPU device is not available or visible.")
    return True

@lru_cache(maxsize=1)
def is_cpu() -> bool:
    is_host_cpu_supported = is_host_cpu_x86() or is_host_cpu_arm64()
    return os.getenv("SGLANG_USE_CPU_ENGINE", "0") == "1" and is_host_cpu_supported

@lru_cache(maxsize=1)
def is_musa() -> bool:
    try:
        import torchada
    except ImportError:
        return False
    return hasattr(torch.version, "musa") and torch.version.musa is not None

HIP-specific FP8 handling from `python/sglang/srt/utils/common.py:113-117`:

if is_hip():
    HIP_FP8_E4M3_FNUZ_MAX = 224.0
    FP8_E4M3_MAX = HIP_FP8_E4M3_FNUZ_MAX
else:
    FP8_E4M3_MAX = torch.finfo(torch.float8_e4m3fn).max

NPU memory query from `python/sglang/srt/utils/common.py:1782-1788`:

def get_npu_memory_capacity():
    try:
        import torch_npu
        return torch.npu.mem_get_info()[1] // 1024 // 1024
    except ImportError as e:
        raise ImportError("torch_npu is required when run on npu device.")

Common Errors

Error Message Cause Solution
`torch_npu detected, but NPU device is not available or visible` NPU driver issue Check CANN toolkit installation and `npu-smi`
`torch_npu is required when run on npu device` torch_npu not installed `pip install torch_npu`
`rocm-smi not found` ROCm drivers missing Install ROCm 6.0+ drivers
`mthreads-gmi not found` Moore Threads drivers missing Install MUSA SDK
`hl-smi not found` Habana drivers missing Install SynapseAI runtime
`aiter is AMD specific kernel library` aiter not installed on AMD `pip install aiter` on AMD ROCm system
`NPU detected, but torchair package is not installed` torchair missing for torch.compile `pip install torchair`
`No accelerator (CUDA, XPU, HPU, NPU, MUSA) is available` No supported hardware detected Install appropriate driver and PyTorch variant

Compatibility Notes

  • AMD ROCm: Uses HIP through PyTorch's CUDA compatibility layer. FP8 uses `E4M3_FNUZ` format (max 224.0) instead of CUDA's `E4M3FN`. Attention backends: `triton`, `aiter`, `wave`.
  • Intel XPU: Requires PVC/LNL/BMG GPUs with F64 support for XMX acceleration. Attention backend: `intel_xpu`.
  • Ascend NPU: Requires CANN toolkit. Supports env vars for multi-stream (`SGLANG_NPU_USE_MULTI_STREAM`) and MLAPo (`SGLANG_NPU_USE_MLAPO`). Attention backend: `ascend`.
  • Intel CPU: Requires Intel AMX tile support. Dimension constraints: output channels % 16 == 0, input channels % 32 == 0. Backend: `intel_amx`.
  • Moore Threads MUSA: Early support. Requires `torchada` package.
  • Habana HPU: Requires `habana_frameworks.torch.hpu` import.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment