Implementation:LMCache LMCache Mem Alloc

Knowledge Sources	LMCache
Domains	Memory Management, CUDA, NUMA
Last Updated	2026-02-09 00:00 GMT

Overview

Provides low-level C++ functions for allocating and freeing CUDA-pinned and NUMA-aware host memory.

Description

This module implements memory allocation routines used by LMCache for high-performance data transfers between host and GPU. It supports three allocation strategies: CUDA pinned memory via cudaHostAlloc, NUMA-bound memory via mmap with mbind, and a combined pinned-NUMA allocation that maps memory to a specific NUMA node and then registers it with CUDA for DMA access. Each allocation function returns a uintptr_t pointer, and corresponding free functions handle proper cleanup including CUDA unregistration and munmap.

Usage

Use these functions when LMCache needs to allocate host memory that participates in GPU DMA transfers, especially in multi-socket NUMA systems where memory locality affects transfer bandwidth. The pinned-NUMA variant is particularly useful for ensuring that host buffers reside on the NUMA node closest to the target GPU.

Code Reference

Source Location

Repository: LMCache
File: csrc/mem_alloc.cpp
Lines: 1-96

Signature

uintptr_t alloc_pinned_ptr(size_t size, unsigned int flags);
void free_pinned_ptr(uintptr_t ptr);
uintptr_t alloc_numa_ptr(size_t size, int node);
void free_numa_ptr(uintptr_t ptr, size_t size);
uintptr_t alloc_pinned_numa_ptr(size_t size, int node);
void free_pinned_numa_ptr(uintptr_t ptr, size_t size);

Import

#include "mem_alloc.h"

I/O Contract

Inputs

Name	Type	Required	Description
size	size_t	Yes	Number of bytes to allocate
flags	unsigned int	Yes (alloc_pinned_ptr)	Flags passed to cudaHostAlloc (e.g., cudaHostAllocDefault)
node	int	Yes (NUMA variants)	NUMA node index to bind memory to
ptr	uintptr_t	Yes (free functions)	Pointer returned by the corresponding alloc function

Outputs

Name	Type	Description
ptr	uintptr_t	Integer representation of the allocated memory pointer (from alloc functions)
(void)	void	Free functions return nothing; throw std::runtime_error on failure

Usage Examples

// Allocate 1 GB of CUDA-pinned memory
uintptr_t pinned = alloc_pinned_ptr(1UL << 30, cudaHostAllocDefault);
// ... use pinned memory for GPU transfers ...
free_pinned_ptr(pinned);

// Allocate 512 MB on NUMA node 0 with CUDA pinning
uintptr_t numa_pinned = alloc_pinned_numa_ptr(512UL << 20, 0);
// ... use for DMA transfers from GPU nearest to NUMA node 0 ...
free_pinned_numa_ptr(numa_pinned, 512UL << 20);

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment