Implementation:InternLM Lmdeploy Allocator
| Knowledge Sources | |
|---|---|
| Domains | Memory_Management, GPU_Computing |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Provides a polymorphic memory allocation framework supporting CPU, pinned CPU, synchronous CUDA, and CUDA memory pool allocators, plus a stack-based arena allocator for reducing allocation overhead.
Description
The AllocatorImpl abstract base class defines the allocation interface with allocate(ssize_t size), deallocate(void*, ssize_t), device(), stream(), and trim() methods. The Allocator wrapper holds a shared_ptr<AllocatorImpl> providing smart-pointer semantics via operator->. Concrete implementations include:
- HostAllocator -- standard CPU allocation via
::operator new - CudaHostAllocator -- pinned memory via
cudaHostAlloc - CudaAllocator -- synchronous GPU memory via
cudaMalloc/cudaFree - CudaMemPoolAllocator -- asynchronous GPU memory pool via
cudaMallocFromPoolAsync/cudaFreeAsync, optionally using the default device pool or creating a custom pool
The header also defines StackAllocatorImpl, a bump/stack allocator that delegates to an underlying allocator for large allocations and maintains a cached region for fast sequential allocate/deallocate patterns with 256-byte alignment. SimpleAllocator wraps arbitrary alloc/dealloc function objects for custom allocator integration. The Device struct pairs a DeviceType (kCPU, kCPUpinned, kDEVICE) with a device ID.
Usage
Used throughout TurboMind for all memory allocation. Buffers and tensors accept an Allocator& parameter. The Context system manages allocator stacks so that code can implicitly use the current allocator for each device type.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File (header): src/turbomind/core/allocator.h
- File (impl): src/turbomind/core/allocator.cc
- Lines: allocator.h 1-247, allocator.cc 1-159
Signature
namespace turbomind::core {
enum class DeviceType : int { kCPU, kCPUpinned, kDEVICE };
struct Device {
DeviceType type;
int id;
Device();
Device(DeviceType type_);
Device(DeviceType type_, int device_);
};
class AllocatorImpl {
public:
virtual ~AllocatorImpl();
virtual void* allocate(ssize_t size) = 0;
virtual void deallocate(void* p, ssize_t size) = 0;
virtual Stream stream() const noexcept;
virtual Device device() const noexcept = 0;
virtual void trim(size_t bytes_to_keep);
};
class Allocator {
public:
Allocator() = default;
explicit Allocator(DeviceType type);
Allocator(Stream stream, bool use_default_pool);
Allocator(shared_ptr<AllocatorImpl> impl);
AllocatorImpl* operator->() const;
explicit operator bool() const noexcept;
};
class StackAllocatorImpl : public AllocatorImpl { /* ... */ };
class SimpleAllocator : public AllocatorImpl { /* ... */ };
} // namespace turbomind::core
Import
#include "src/turbomind/core/allocator.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| type | DeviceType | Allocator(DeviceType) | Selects CPU, pinned, or device allocator |
| stream | Stream | Allocator(Stream, bool) | CUDA stream for async pool allocator |
| use_default_pool | bool | Allocator(Stream, bool) | Whether to use the default CUDA memory pool |
| size | ssize_t | allocate() | Number of bytes to allocate |
Outputs
| Name | Type | Description |
|---|---|---|
| ptr | void* | Pointer to allocated memory |
| device() | Device | The device type and ID associated with this allocator |
| stream() | Stream | The CUDA stream associated with pool allocators (invalid for sync allocators) |
Usage Examples
#include "src/turbomind/core/allocator.h"
using namespace turbomind::core;
// Create a CPU allocator
Allocator cpu_alloc(kCPU);
void* ptr = cpu_alloc->allocate(1024);
cpu_alloc->deallocate(ptr, 1024);
// Create a CUDA memory pool allocator
Stream stream = Stream::create();
Allocator pool_alloc(stream, /*use_default_pool=*/true);
void* gpu_ptr = pool_alloc->allocate(4096);
pool_alloc->deallocate(gpu_ptr, 4096);
// Stack allocator for arena-style allocation
auto stack = std::make_shared<StackAllocatorImpl>(pool_alloc_impl);
void* fast_ptr = stack->allocate(256);
stack->deallocate(fast_ptr, 256);