Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Context

From Leeroopedia


Knowledge Sources
Domains Resource_Management, Core_Infrastructure
Last Updated 2026-02-07 15:00 GMT

Overview

Implements a thread-local context stack that provides implicit access to the current CUDA stream and memory allocators (host, device, pinned) for the calling thread.

Description

The Context class provides static methods to access the current thread's CUDA stream and allocators without explicitly passing them through call chains. Internally, a thread_local ContextStorage instance maintains separate stacks for streams, host allocators, device allocators, and pinned allocators, along with a mask stack tracking which resources were pushed at each level. The push() and pop() operations are private and accessed through the RAII ContextGuard class, which accepts a variadic list of streams and/or allocators, pushes each one, and pops them all on destruction. Static accessors Context::stream(), Context::host_alloc(), Context::device_alloc(), Context::pinned_alloc(), and Context::alloc(Device) return references to the top of each respective stack. A default host allocator (kCPU) is pushed during ContextStorage construction.

Usage

Used to establish the execution environment for TurboMind operations. Layers and kernels call Context::stream() and Context::alloc(device) to obtain the active stream and allocator. The ContextGuard is typically created at the beginning of an inference iteration.

Code Reference

Source Location

Signature

namespace turbomind::core {

class Context {
public:
    static Stream&    stream();
    static Allocator& host_alloc();
    static Allocator& device_alloc();
    static Allocator& pinned_alloc();
    static Allocator& alloc(Device device);

private:
    friend class ContextGuard;
    static void push(const Stream& stream);
    static void push(const Allocator& alloc);
    static void pop();
};

class ContextGuard {
public:
    template<class... Args>
    explicit ContextGuard(Args&&... args);
    ~ContextGuard();
};

}  // namespace turbomind::core

Import

#include "src/turbomind/core/context.h"

I/O Contract

Inputs

Name Type Required Description
args Stream and/or Allocator ContextGuard ctor Resources to push onto the context stacks
device Device alloc(Device) Device type to look up the appropriate allocator

Outputs

Name Type Description
stream() Stream& Reference to the current thread's active CUDA stream
host_alloc() Allocator& Reference to the current host memory allocator
device_alloc() Allocator& Reference to the current CUDA device allocator
pinned_alloc() Allocator& Reference to the current pinned memory allocator

Usage Examples

#include "src/turbomind/core/context.h"

using namespace turbomind::core;

// Set up context with a stream and device allocator
Stream stream = Stream::create();
Allocator device_alloc(stream, /*use_default_pool=*/true);

{
    ContextGuard guard(stream, device_alloc);

    // Within this scope, Context::stream() returns `stream`
    // and Context::device_alloc() returns `device_alloc`
    Buffer buf(1024, kFloat32, kDEVICE);  // uses Context::device_alloc()
}
// Resources popped when guard goes out of scope

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment