Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Vllm project Vllm CUDA

From Leeroopedia


Knowledge Sources
Domains GPU_Computing, CUDA
Last Updated 2026-02-08 00:00 GMT

Overview

NVIDIA CUDA programming model environment for vLLM, providing the compiler toolchain, PTX ISA, and device-level programming abstractions required by vLLM's custom CUDA kernels, including Marlin quantization kernels and matrix multiply-accumulate (MMA) operations.

Description

This environment defines the CUDA programming model and compilation infrastructure that vLLM uses to build and execute custom GPU kernels. Unlike the higher-level CUDA runtime environment, this environment focuses on the low-level CUDA programming abstractions: the nvcc compiler, PTX (Parallel Thread Execution) intermediate representation, warp-level matrix operations (WMMA/MMA), shared memory management, and register allocation. vLLM's Marlin kernels use inline PTX assembly for warp-level matrix multiply-accumulate operations to achieve near-peak throughput for weight-only quantized inference (INT4/INT8 weights with FP16 accumulation). The kernels directly manipulate the GPU's warp schedulers, shared memory banks, and register files through PTX intrinsics for maximum hardware utilization.

Usage

This environment is required at build time when compiling vLLM from source. The nvcc compiler translates CUDA C++ source files (and inline PTX assembly) into device-specific binary code (SASS) or portable PTX. The CUDA_HOME or CUDA_PATH environment variable should point to the CUDA toolkit installation directory. Target GPU architectures are specified via TORCH_CUDA_ARCH_LIST or vLLM's CUDA_SUPPORTED_ARCHS in CMakeLists.txt.

Requirements

Requirement Value
CUDA Toolkit 12.x (12.4+ recommended)
nvcc Compiler Included with CUDA toolkit
PTX ISA Version 7.0+ (for SM 7.0+ targets)
Host Compiler GCC >= 9 or Clang (compatible with nvcc)
CMake >= 3.26.1
GPU Architectures SM 7.0 through SM 10.0 (Volta through Blackwell)
CUDA_HOME Environment variable pointing to CUDA toolkit root

Semantic Links

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment