Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Allocator

From Leeroopedia


Knowledge Sources
Domains Memory_Management, GPU_Computing
Last Updated 2026-02-07 15:00 GMT

Overview

Provides a polymorphic memory allocation framework supporting CPU, pinned CPU, synchronous CUDA, and CUDA memory pool allocators, plus a stack-based arena allocator for reducing allocation overhead.

Description

The AllocatorImpl abstract base class defines the allocation interface with allocate(ssize_t size), deallocate(void*, ssize_t), device(), stream(), and trim() methods. The Allocator wrapper holds a shared_ptr<AllocatorImpl> providing smart-pointer semantics via operator->. Concrete implementations include:

  • HostAllocator -- standard CPU allocation via ::operator new
  • CudaHostAllocator -- pinned memory via cudaHostAlloc
  • CudaAllocator -- synchronous GPU memory via cudaMalloc/cudaFree
  • CudaMemPoolAllocator -- asynchronous GPU memory pool via cudaMallocFromPoolAsync/cudaFreeAsync, optionally using the default device pool or creating a custom pool

The header also defines StackAllocatorImpl, a bump/stack allocator that delegates to an underlying allocator for large allocations and maintains a cached region for fast sequential allocate/deallocate patterns with 256-byte alignment. SimpleAllocator wraps arbitrary alloc/dealloc function objects for custom allocator integration. The Device struct pairs a DeviceType (kCPU, kCPUpinned, kDEVICE) with a device ID.

Usage

Used throughout TurboMind for all memory allocation. Buffers and tensors accept an Allocator& parameter. The Context system manages allocator stacks so that code can implicitly use the current allocator for each device type.

Code Reference

Source Location

Signature

namespace turbomind::core {

enum class DeviceType : int { kCPU, kCPUpinned, kDEVICE };

struct Device {
    DeviceType type;
    int        id;
    Device();
    Device(DeviceType type_);
    Device(DeviceType type_, int device_);
};

class AllocatorImpl {
public:
    virtual ~AllocatorImpl();
    virtual void* allocate(ssize_t size) = 0;
    virtual void deallocate(void* p, ssize_t size) = 0;
    virtual Stream stream() const noexcept;
    virtual Device device() const noexcept = 0;
    virtual void trim(size_t bytes_to_keep);
};

class Allocator {
public:
    Allocator() = default;
    explicit Allocator(DeviceType type);
    Allocator(Stream stream, bool use_default_pool);
    Allocator(shared_ptr<AllocatorImpl> impl);
    AllocatorImpl* operator->() const;
    explicit operator bool() const noexcept;
};

class StackAllocatorImpl : public AllocatorImpl { /* ... */ };
class SimpleAllocator : public AllocatorImpl { /* ... */ };

}  // namespace turbomind::core

Import

#include "src/turbomind/core/allocator.h"

I/O Contract

Inputs

Name Type Required Description
type DeviceType Allocator(DeviceType) Selects CPU, pinned, or device allocator
stream Stream Allocator(Stream, bool) CUDA stream for async pool allocator
use_default_pool bool Allocator(Stream, bool) Whether to use the default CUDA memory pool
size ssize_t allocate() Number of bytes to allocate

Outputs

Name Type Description
ptr void* Pointer to allocated memory
device() Device The device type and ID associated with this allocator
stream() Stream The CUDA stream associated with pool allocators (invalid for sync allocators)

Usage Examples

#include "src/turbomind/core/allocator.h"

using namespace turbomind::core;

// Create a CPU allocator
Allocator cpu_alloc(kCPU);
void* ptr = cpu_alloc->allocate(1024);
cpu_alloc->deallocate(ptr, 1024);

// Create a CUDA memory pool allocator
Stream stream = Stream::create();
Allocator pool_alloc(stream, /*use_default_pool=*/true);
void* gpu_ptr = pool_alloc->allocate(4096);
pool_alloc->deallocate(gpu_ptr, 4096);

// Stack allocator for arena-style allocation
auto stack = std::make_shared<StackAllocatorImpl>(pool_alloc_impl);
void* fast_ptr = stack->allocate(256);
stack->deallocate(fast_ptr, 256);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment