Environment:Vllm project Vllm CUDA

Knowledge Sources	vllm NVIDIA CUDA
Domains	GPU_Computing, CUDA
Last Updated	2026-02-08 00:00 GMT

Overview

NVIDIA CUDA programming model environment for vLLM, providing the compiler toolchain, PTX ISA, and device-level programming abstractions required by vLLM's custom CUDA kernels, including Marlin quantization kernels and matrix multiply-accumulate (MMA) operations.

Description

This environment defines the CUDA programming model and compilation infrastructure that vLLM uses to build and execute custom GPU kernels. Unlike the higher-level CUDA runtime environment, this environment focuses on the low-level CUDA programming abstractions: the nvcc compiler, PTX (Parallel Thread Execution) intermediate representation, warp-level matrix operations (WMMA/MMA), shared memory management, and register allocation. vLLM's Marlin kernels use inline PTX assembly for warp-level matrix multiply-accumulate operations to achieve near-peak throughput for weight-only quantized inference (INT4/INT8 weights with FP16 accumulation). The kernels directly manipulate the GPU's warp schedulers, shared memory banks, and register files through PTX intrinsics for maximum hardware utilization.

Usage

This environment is required at build time when compiling vLLM from source. The nvcc compiler translates CUDA C++ source files (and inline PTX assembly) into device-specific binary code (SASS) or portable PTX. The CUDA_HOME or CUDA_PATH environment variable should point to the CUDA toolkit installation directory. Target GPU architectures are specified via TORCH_CUDA_ARCH_LIST or vLLM's CUDA_SUPPORTED_ARCHS in CMakeLists.txt.

Requirements

Requirement	Value
CUDA Toolkit	12.x (12.4+ recommended)
nvcc Compiler	Included with CUDA toolkit
PTX ISA	Version 7.0+ (for SM 7.0+ targets)
Host Compiler	GCC >= 9 or Clang (compatible with nvcc)
CMake	>= 3.26.1
GPU Architectures	SM 7.0 through SM 10.0 (Volta through Blackwell)
CUDA_HOME	Environment variable pointing to CUDA toolkit root

Semantic Links

Implementation:Vllm_project_Vllm_Marlin_MMA

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment