Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Vllm project Vllm ROCm

From Leeroopedia


Knowledge Sources
Domains GPU_Computing, AMD_ROCm
Last Updated 2026-02-08 00:00 GMT

Overview

AMD ROCm (Radeon Open Compute) runtime environment for running vLLM inference on AMD Instinct GPUs, providing the HIP compiler, ROCm runtime libraries, and AMD-optimized kernel implementations for high-throughput LLM serving.

Description

This environment defines the AMD ROCm software stack required to build and run vLLM on AMD GPUs. ROCm is AMD's open-source GPU computing platform, analogous to NVIDIA's CUDA. vLLM supports ROCm as a first-class backend, using HIP (Heterogeneous-Compute Interface for Portability) to compile GPU kernels that target AMD's CDNA architecture (MI200, MI300 series). The ROCm backend includes AMD-specific optimizations such as AITER (AMD Inference TEnsoR) operators for attention and MLP computation, custom paged attention kernels optimized for MI300X's HBM3 memory bandwidth, FP8 padding for improved memory alignment, and custom allreduce kernels using the Quick Reduce algorithm for efficient multi-GPU communication. The RCCL (ROCm Communication Collectives Library) provides NCCL-compatible distributed communication primitives.

Usage

To use vLLM with ROCm, set VLLM_TARGET_DEVICE=rocm during installation. The ROCm-specific environment variables (VLLM_ROCM_USE_AITER, VLLM_ROCM_FP8_PADDING, VLLM_ROCM_CUSTOM_PAGED_ATTN) control AMD-specific kernel optimizations at runtime. Multi-GPU inference uses RCCL for collective operations. Docker images based on rocm/pytorch provide a preconfigured ROCm environment for deployment.

Requirements

Requirement Value
ROCm Version 6.x+ (6.2+ recommended)
HIP Compiler hipcc (included with ROCm)
GPU Hardware AMD Instinct MI200 series (MI250X) or MI300 series (MI300X, MI300A)
GPU Architecture CDNA2 (gfx90a) or CDNA3 (gfx942)
GPU Memory 128 GB HBM2e (MI250X) or 192 GB HBM3 (MI300X)
RCCL ROCm Communication Collectives Library (for multi-GPU)
Operating System Linux (Ubuntu 22.04+ or RHEL 9+)
Python >= 3.10
PyTorch ROCm-compatible build of PyTorch

Semantic Links

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment