Environment:Vllm project Vllm ROCm

Knowledge Sources	vllm AMD ROCm
Domains	GPU_Computing, AMD_ROCm
Last Updated	2026-02-08 00:00 GMT

Overview

AMD ROCm (Radeon Open Compute) runtime environment for running vLLM inference on AMD Instinct GPUs, providing the HIP compiler, ROCm runtime libraries, and AMD-optimized kernel implementations for high-throughput LLM serving.

Description

This environment defines the AMD ROCm software stack required to build and run vLLM on AMD GPUs. ROCm is AMD's open-source GPU computing platform, analogous to NVIDIA's CUDA. vLLM supports ROCm as a first-class backend, using HIP (Heterogeneous-Compute Interface for Portability) to compile GPU kernels that target AMD's CDNA architecture (MI200, MI300 series). The ROCm backend includes AMD-specific optimizations such as AITER (AMD Inference TEnsoR) operators for attention and MLP computation, custom paged attention kernels optimized for MI300X's HBM3 memory bandwidth, FP8 padding for improved memory alignment, and custom allreduce kernels using the Quick Reduce algorithm for efficient multi-GPU communication. The RCCL (ROCm Communication Collectives Library) provides NCCL-compatible distributed communication primitives.

Usage

To use vLLM with ROCm, set VLLM_TARGET_DEVICE=rocm during installation. The ROCm-specific environment variables (VLLM_ROCM_USE_AITER, VLLM_ROCM_FP8_PADDING, VLLM_ROCM_CUSTOM_PAGED_ATTN) control AMD-specific kernel optimizations at runtime. Multi-GPU inference uses RCCL for collective operations. Docker images based on rocm/pytorch provide a preconfigured ROCm environment for deployment.

Requirements

Requirement	Value
ROCm Version	6.x+ (6.2+ recommended)
HIP Compiler	hipcc (included with ROCm)
GPU Hardware	AMD Instinct MI200 series (MI250X) or MI300 series (MI300X, MI300A)
GPU Architecture	CDNA2 (gfx90a) or CDNA3 (gfx942)
GPU Memory	128 GB HBM2e (MI250X) or 192 GB HBM3 (MI300X)
RCCL	ROCm Communication Collectives Library (for multi-GPU)
Operating System	Linux (Ubuntu 22.04+ or RHEL 9+)
Python	>= 3.10
PyTorch	ROCm-compatible build of PyTorch

Semantic Links

Implementation:Vllm_project_Vllm_Quick_Reduce

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment