Environment:Sgl project Sglang CPU Runtime
Appearance
Sgl_project_Sglang_CPU_Runtime is the CPU inference runtime environment with Intel AMX/AVX-512 SIMD acceleration for SGLang, providing optimized kernels for attention, GEMM, MoE, and other operations on x86 processors.
Requirements
- Linux x86_64 with Intel AMX tile support (Xeon 4th Gen+ / Sapphire Rapids)
- AVX-512 instruction set support
- Python 3.10+
- PyTorch (CPU build)
- `sgl-kernel` with CPU backend
- `sglang-cpu` package
- C++ compiler with AVX-512 and AMX intrinsics support
- Environment variable `SGLANG_USE_CPU_ENGINE=1`
Required By
- Implementation:Sgl_project_Sglang_CPU_BMM
- Implementation:Sgl_project_Sglang_CPU_Common_Header
- Implementation:Sgl_project_Sglang_CPU_Decode_Attention
- Implementation:Sgl_project_Sglang_CPU_Extend_Attention
- Implementation:Sgl_project_Sglang_CPU_Flash_Attention
- Implementation:Sgl_project_Sglang_CPU_GEMM
- Implementation:Sgl_project_Sglang_CPU_GEMM_FP8
- Implementation:Sgl_project_Sglang_CPU_GEMM_INT4
- Implementation:Sgl_project_Sglang_CPU_GEMM_INT8
- Implementation:Sgl_project_Sglang_CPU_Mamba_Conv1D
- Implementation:Sgl_project_Sglang_CPU_Mamba_FLA
- Implementation:Sgl_project_Sglang_CPU_MoE
- Implementation:Sgl_project_Sglang_CPU_MoE_FP8
- Implementation:Sgl_project_Sglang_CPU_MoE_INT4
- Implementation:Sgl_project_Sglang_CPU_MoE_INT8
- Implementation:Sgl_project_Sglang_CPU_Normalization
- Implementation:Sgl_project_Sglang_CPU_QKV_Projection
- Implementation:Sgl_project_Sglang_CPU_RoPE
- Implementation:Sgl_project_Sglang_CPU_Shared_Memory
- Implementation:Sgl_project_Sglang_CPU_TopK
- Implementation:Sgl_project_Sglang_CPU_Torch_Extension
- Implementation:Sgl_project_Sglang_CPU_Vec_SIMD
See Also
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment