Environment:Vllm project Vllm AArch64 CPU
| Knowledge Sources | |
|---|---|
| Domains | CPU_Inference, ARM_Architecture |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
ARM AArch64 CPU architecture environment for vLLM, providing the ARM-specific compiler toolchain, NEON/SVE SIMD intrinsics, and runtime support required for CPU-based LLM inference on ARM server processors such as AWS Graviton, Ampere Altra, and NVIDIA Grace.
Description
This environment defines the AArch64 (64-bit ARM) hardware and software stack required by vLLM's ARM-optimized CPU kernels. ARM-based server processors have become increasingly prevalent in cloud and edge deployments due to their power efficiency. vLLM's CPU backend includes AArch64-specific SIMD implementations using NEON (128-bit SIMD, mandatory on AArch64) and SVE/SVE2 (Scalable Vector Extension, optional, up to 2048-bit vectors) intrinsics for key operations including quantized matrix multiplication, activation functions, and type conversions. The dynamic 4-bit integer MoE (Mixture-of-Experts) kernel for CPU is specifically optimized for AArch64, using NEON dot-product instructions for int4-to-int8 dequantization and accumulation within MoE expert routing. The AArch64 CPU type definitions (CPU_Types_ARM) provide architecture-specific vector types and intrinsic wrappers that abstract over NEON and SVE instruction sets.
Usage
To build vLLM for AArch64, cross-compile from an x86_64 host using an AArch64 cross-compiler toolchain, or build natively on an AArch64 system. Set VLLM_TARGET_DEVICE=cpu during installation. The build system auto-detects AArch64 via CMake's CMAKE_SYSTEM_PROCESSOR and enables ARM-specific SIMD code paths. At runtime, ensure OMP_NUM_THREADS is set to match the number of physical cores (ARM big.LITTLE configurations may require pinning to performance cores). SVE support is detected at compile time via compiler feature tests and at runtime via /proc/cpuinfo or HWCAP flags.
Requirements
| Requirement | Value |
|---|---|
| CPU Architecture | AArch64 (64-bit ARM, ARMv8-A or later) |
| ISA Extensions | NEON (mandatory), SVE/SVE2 (optional, for enhanced vectorization) |
| Dot Product Extension | ARMv8.2-A dot product instructions (SDOT/UDOT) recommended for int4/int8 kernels |
| Compiler | GCC >= 10 with AArch64 target, or Clang >= 12 with AArch64 target |
| C++ Standard | C++17 |
| OpenMP | OpenMP 4.5+ for thread-level parallelism |
| Operating System | Linux (Ubuntu 22.04+ aarch64, Amazon Linux 2023 aarch64) |
| Example Hardware | AWS Graviton3/4, Ampere Altra/AmpereOne, NVIDIA Grace CPU |
| CMake | >= 3.26.1 |