Implementation:InternLM Lmdeploy Context
| Knowledge Sources | |
|---|---|
| Domains | Resource_Management, Core_Infrastructure |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Implements a thread-local context stack that provides implicit access to the current CUDA stream and memory allocators (host, device, pinned) for the calling thread.
Description
The Context class provides static methods to access the current thread's CUDA stream and allocators without explicitly passing them through call chains. Internally, a thread_local ContextStorage instance maintains separate stacks for streams, host allocators, device allocators, and pinned allocators, along with a mask stack tracking which resources were pushed at each level. The push() and pop() operations are private and accessed through the RAII ContextGuard class, which accepts a variadic list of streams and/or allocators, pushes each one, and pops them all on destruction. Static accessors Context::stream(), Context::host_alloc(), Context::device_alloc(), Context::pinned_alloc(), and Context::alloc(Device) return references to the top of each respective stack. A default host allocator (kCPU) is pushed during ContextStorage construction.
Usage
Used to establish the execution environment for TurboMind operations. Layers and kernels call Context::stream() and Context::alloc(device) to obtain the active stream and allocator. The ContextGuard is typically created at the beginning of an inference iteration.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File (header): src/turbomind/core/context.h
- File (impl): src/turbomind/core/context.cc
- Lines: context.h 1-43, context.cc 1-144
Signature
namespace turbomind::core {
class Context {
public:
static Stream& stream();
static Allocator& host_alloc();
static Allocator& device_alloc();
static Allocator& pinned_alloc();
static Allocator& alloc(Device device);
private:
friend class ContextGuard;
static void push(const Stream& stream);
static void push(const Allocator& alloc);
static void pop();
};
class ContextGuard {
public:
template<class... Args>
explicit ContextGuard(Args&&... args);
~ContextGuard();
};
} // namespace turbomind::core
Import
#include "src/turbomind/core/context.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| args | Stream and/or Allocator | ContextGuard ctor | Resources to push onto the context stacks |
| device | Device | alloc(Device) | Device type to look up the appropriate allocator |
Outputs
| Name | Type | Description |
|---|---|---|
| stream() | Stream& | Reference to the current thread's active CUDA stream |
| host_alloc() | Allocator& | Reference to the current host memory allocator |
| device_alloc() | Allocator& | Reference to the current CUDA device allocator |
| pinned_alloc() | Allocator& | Reference to the current pinned memory allocator |
Usage Examples
#include "src/turbomind/core/context.h"
using namespace turbomind::core;
// Set up context with a stream and device allocator
Stream stream = Stream::create();
Allocator device_alloc(stream, /*use_default_pool=*/true);
{
ContextGuard guard(stream, device_alloc);
// Within this scope, Context::stream() returns `stream`
// and Context::device_alloc() returns `device_alloc`
Buffer buf(1024, kFloat32, kDEVICE); // uses Context::device_alloc()
}
// Resources popped when guard goes out of scope