Implementation:InternLM Lmdeploy Context

Knowledge Sources	InternLM_Lmdeploy
Domains	Resource_Management, Core_Infrastructure
Last Updated	2026-02-07 15:00 GMT

Overview

Implements a thread-local context stack that provides implicit access to the current CUDA stream and memory allocators (host, device, pinned) for the calling thread.

Description

The Context class provides static methods to access the current thread's CUDA stream and allocators without explicitly passing them through call chains. Internally, a thread_local ContextStorage instance maintains separate stacks for streams, host allocators, device allocators, and pinned allocators, along with a mask stack tracking which resources were pushed at each level. The push() and pop() operations are private and accessed through the RAII ContextGuard class, which accepts a variadic list of streams and/or allocators, pushes each one, and pops them all on destruction. Static accessors Context::stream(), Context::host_alloc(), Context::device_alloc(), Context::pinned_alloc(), and Context::alloc(Device) return references to the top of each respective stack. A default host allocator (kCPU) is pushed during ContextStorage construction.

Usage

Used to establish the execution environment for TurboMind operations. Layers and kernels call Context::stream() and Context::alloc(device) to obtain the active stream and allocator. The ContextGuard is typically created at the beginning of an inference iteration.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File (header): src/turbomind/core/context.h
File (impl): src/turbomind/core/context.cc
Lines: context.h 1-43, context.cc 1-144

Signature

namespace turbomind::core {

class Context {
public:
    static Stream&    stream();
    static Allocator& host_alloc();
    static Allocator& device_alloc();
    static Allocator& pinned_alloc();
    static Allocator& alloc(Device device);

private:
    friend class ContextGuard;
    static void push(const Stream& stream);
    static void push(const Allocator& alloc);
    static void pop();
};

class ContextGuard {
public:
    template<class... Args>
    explicit ContextGuard(Args&&... args);
    ~ContextGuard();
};

}  // namespace turbomind::core

Import

#include "src/turbomind/core/context.h"

I/O Contract

Inputs

Name	Type	Required	Description
args	Stream and/or Allocator	ContextGuard ctor	Resources to push onto the context stacks
device	Device	alloc(Device)	Device type to look up the appropriate allocator

Outputs

Name	Type	Description
stream()	Stream&	Reference to the current thread's active CUDA stream
host_alloc()	Allocator&	Reference to the current host memory allocator
device_alloc()	Allocator&	Reference to the current CUDA device allocator
pinned_alloc()	Allocator&	Reference to the current pinned memory allocator

Usage Examples

#include "src/turbomind/core/context.h"

using namespace turbomind::core;

// Set up context with a stream and device allocator
Stream stream = Stream::create();
Allocator device_alloc(stream, /*use_default_pool=*/true);

{
    ContextGuard guard(stream, device_alloc);

    // Within this scope, Context::stream() returns `stream`
    // and Context::device_alloc() returns `device_alloc`
    Buffer buf(1024, kFloat32, kDEVICE);  // uses Context::device_alloc()
}
// Resources popped when guard goes out of scope

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment