Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vllm project Vllm CUMem Allocator

From Leeroopedia


Knowledge Sources
Domains GPU Memory Management, CUDA Virtual Memory
Last Updated 2026-02-08 00:00 GMT

Overview

Implements a custom PyTorch CUDAPluggableAllocator using CUDA virtual memory management APIs (cuMemCreate, cuMemMap) for fine-grained GPU memory control.

Description

This file provides a Python-accessible C extension module that bypasses PyTorch's default caching allocator to give vLLM direct control over GPU memory allocation. It uses the CUDA Driver API (cuMemCreate, cuMemMap, cuMemSetAccess) to allocate and map pinned virtual memory with optional GPUDirect RDMA and NVLink fabric handle support. On ROCm, it supports configurable chunk sizes (default 256MB, overridable via VLLM_ROCM_SLEEP_MEM_CHUNK_SIZE environment variable) with multi-chunk allocation and cleanup. The module exposes python_create_and_map and python_unmap_and_release as Python-callable functions, with optional Python callbacks (g_python_malloc_callback, g_python_free_callback) for allocation tracking.

Usage

This file is compiled as a standalone Python C extension module. It is loaded by the vLLM memory management layer to provide custom GPU memory allocation when advanced memory control is needed, such as reducing fragmentation or enabling sleep memory for large model inference.

Code Reference

Source Location

Signature

// Helper functions
void ensure_context(unsigned long long device);

void create_and_map(unsigned long long device, ssize_t size,
                    CUdeviceptr d_mem,
                    CUmemGenericAllocationHandle* p_memHandle);

void unmap_and_release(unsigned long long device, ssize_t size,
                       CUdeviceptr d_mem,
                       CUmemGenericAllocationHandle* p_memHandle);

// Python-exposed functions
static PyObject* py_init_module(PyObject* self, PyObject* args);
static PyObject* python_create_and_map(PyObject* self, PyObject* args);
static PyObject* python_unmap_and_release(PyObject* self, PyObject* args);

// Utility
PyObject* create_tuple_from_c_integers(unsigned long long a,
    unsigned long long b, unsigned long long c, unsigned long long d);

Import

#include "cumem_allocator_compat.h"
#include <Python.h>
#include <iostream>

I/O Contract

Inputs

Name Type Required Description
device unsigned long long Yes CUDA device ordinal to allocate on
size ssize_t Yes Allocation size in bytes (must be aligned to granularity)
d_mem CUdeviceptr Yes Virtual address to map the allocation to
p_memHandle CUmemGenericAllocationHandle* Yes Output handle for the allocated memory
g_python_malloc_callback PyObject* No Optional Python callback invoked on allocation
g_python_free_callback PyObject* No Optional Python callback invoked on free

Outputs

Name Type Description
(return) PyObject* Python tuple of (device, size, d_mem, memHandle) for create_and_map
error_msg char[10240] Human-readable error message buffer on CUDA failure
error_code CUresult CUDA error code (0 for success)

Usage Examples

// From Python via the C extension module:
// 1. Initialize the module with optional malloc/free callbacks
//    py_init_module(malloc_callback, free_callback)
//
// 2. Create and map GPU memory
//    device, size, d_mem, handle = python_create_and_map(device, size, d_mem, handle)
//
// 3. Release GPU memory
//    python_unmap_and_release(device, size, d_mem, handle)

// C-level usage:
unsigned long long device = 0;
ssize_t size = 1024 * 1024 * 256; // 256MB
CUdeviceptr d_mem;
cuMemAddressReserve(&d_mem, size, 0, 0, 0);
CUmemGenericAllocationHandle memHandle;
create_and_map(device, size, d_mem, &memHandle);
// ... use memory ...
unmap_and_release(device, size, d_mem, &memHandle);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment