Implementation:InternLM Lmdeploy Allocator

Knowledge Sources	InternLM_Lmdeploy
Domains	Memory_Management, GPU_Computing
Last Updated	2026-02-07 15:00 GMT

Overview

Provides a polymorphic memory allocation framework supporting CPU, pinned CPU, synchronous CUDA, and CUDA memory pool allocators, plus a stack-based arena allocator for reducing allocation overhead.

Description

The AllocatorImpl abstract base class defines the allocation interface with allocate(ssize_t size), deallocate(void*, ssize_t), device(), stream(), and trim() methods. The Allocator wrapper holds a shared_ptr<AllocatorImpl> providing smart-pointer semantics via operator->. Concrete implementations include:

HostAllocator -- standard CPU allocation via ::operator new
CudaHostAllocator -- pinned memory via cudaHostAlloc
CudaAllocator -- synchronous GPU memory via cudaMalloc/cudaFree
CudaMemPoolAllocator -- asynchronous GPU memory pool via cudaMallocFromPoolAsync/cudaFreeAsync, optionally using the default device pool or creating a custom pool

The header also defines StackAllocatorImpl, a bump/stack allocator that delegates to an underlying allocator for large allocations and maintains a cached region for fast sequential allocate/deallocate patterns with 256-byte alignment. SimpleAllocator wraps arbitrary alloc/dealloc function objects for custom allocator integration. The Device struct pairs a DeviceType (kCPU, kCPUpinned, kDEVICE) with a device ID.

Usage

Used throughout TurboMind for all memory allocation. Buffers and tensors accept an Allocator& parameter. The Context system manages allocator stacks so that code can implicitly use the current allocator for each device type.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File (header): src/turbomind/core/allocator.h
File (impl): src/turbomind/core/allocator.cc
Lines: allocator.h 1-247, allocator.cc 1-159

Signature

namespace turbomind::core {

enum class DeviceType : int { kCPU, kCPUpinned, kDEVICE };

struct Device {
    DeviceType type;
    int        id;
    Device();
    Device(DeviceType type_);
    Device(DeviceType type_, int device_);
};

class AllocatorImpl {
public:
    virtual ~AllocatorImpl();
    virtual void* allocate(ssize_t size) = 0;
    virtual void deallocate(void* p, ssize_t size) = 0;
    virtual Stream stream() const noexcept;
    virtual Device device() const noexcept = 0;
    virtual void trim(size_t bytes_to_keep);
};

class Allocator {
public:
    Allocator() = default;
    explicit Allocator(DeviceType type);
    Allocator(Stream stream, bool use_default_pool);
    Allocator(shared_ptr<AllocatorImpl> impl);
    AllocatorImpl* operator->() const;
    explicit operator bool() const noexcept;
};

class StackAllocatorImpl : public AllocatorImpl { /* ... */ };
class SimpleAllocator : public AllocatorImpl { /* ... */ };

}  // namespace turbomind::core

Import

#include "src/turbomind/core/allocator.h"

I/O Contract

Inputs

Name	Type	Required	Description
type	DeviceType	Allocator(DeviceType)	Selects CPU, pinned, or device allocator
stream	Stream	Allocator(Stream, bool)	CUDA stream for async pool allocator
use_default_pool	bool	Allocator(Stream, bool)	Whether to use the default CUDA memory pool
size	ssize_t	allocate()	Number of bytes to allocate

Outputs

Name	Type	Description
ptr	void*	Pointer to allocated memory
device()	Device	The device type and ID associated with this allocator
stream()	Stream	The CUDA stream associated with pool allocators (invalid for sync allocators)

Usage Examples

#include "src/turbomind/core/allocator.h"

using namespace turbomind::core;

// Create a CPU allocator
Allocator cpu_alloc(kCPU);
void* ptr = cpu_alloc->allocate(1024);
cpu_alloc->deallocate(ptr, 1024);

// Create a CUDA memory pool allocator
Stream stream = Stream::create();
Allocator pool_alloc(stream, /*use_default_pool=*/true);
void* gpu_ptr = pool_alloc->allocate(4096);
pool_alloc->deallocate(gpu_ptr, 4096);

// Stack allocator for arena-style allocation
auto stack = std::make_shared<StackAllocatorImpl>(pool_alloc_impl);
void* fast_ptr = stack->allocate(256);
stack->deallocate(fast_ptr, 256);

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment