Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:LMCache LMCache Mem Alloc

From Leeroopedia
Revision as of 15:25, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/LMCache_LMCache_Mem_Alloc.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Memory Management, CUDA, NUMA
Last Updated 2026-02-09 00:00 GMT

Overview

Provides low-level C++ functions for allocating and freeing CUDA-pinned and NUMA-aware host memory.

Description

This module implements memory allocation routines used by LMCache for high-performance data transfers between host and GPU. It supports three allocation strategies: CUDA pinned memory via cudaHostAlloc, NUMA-bound memory via mmap with mbind, and a combined pinned-NUMA allocation that maps memory to a specific NUMA node and then registers it with CUDA for DMA access. Each allocation function returns a uintptr_t pointer, and corresponding free functions handle proper cleanup including CUDA unregistration and munmap.

Usage

Use these functions when LMCache needs to allocate host memory that participates in GPU DMA transfers, especially in multi-socket NUMA systems where memory locality affects transfer bandwidth. The pinned-NUMA variant is particularly useful for ensuring that host buffers reside on the NUMA node closest to the target GPU.

Code Reference

Source Location

Signature

uintptr_t alloc_pinned_ptr(size_t size, unsigned int flags);
void free_pinned_ptr(uintptr_t ptr);
uintptr_t alloc_numa_ptr(size_t size, int node);
void free_numa_ptr(uintptr_t ptr, size_t size);
uintptr_t alloc_pinned_numa_ptr(size_t size, int node);
void free_pinned_numa_ptr(uintptr_t ptr, size_t size);

Import

#include "mem_alloc.h"

I/O Contract

Inputs

Name Type Required Description
size size_t Yes Number of bytes to allocate
flags unsigned int Yes (alloc_pinned_ptr) Flags passed to cudaHostAlloc (e.g., cudaHostAllocDefault)
node int Yes (NUMA variants) NUMA node index to bind memory to
ptr uintptr_t Yes (free functions) Pointer returned by the corresponding alloc function

Outputs

Name Type Description
ptr uintptr_t Integer representation of the allocated memory pointer (from alloc functions)
(void) void Free functions return nothing; throw std::runtime_error on failure

Usage Examples

// Allocate 1 GB of CUDA-pinned memory
uintptr_t pinned = alloc_pinned_ptr(1UL << 30, cudaHostAllocDefault);
// ... use pinned memory for GPU transfers ...
free_pinned_ptr(pinned);

// Allocate 512 MB on NUMA node 0 with CUDA pinning
uintptr_t numa_pinned = alloc_pinned_numa_ptr(512UL << 20, 0);
// ... use for DMA transfers from GPU nearest to NUMA node 0 ...
free_pinned_numa_ptr(numa_pinned, 512UL << 20);

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment