Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Buffer

From Leeroopedia
Revision as of 15:13, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/InternLM_Lmdeploy_Buffer.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Memory_Management, Core_Infrastructure
Last Updated 2026-02-07 15:00 GMT

Overview

Provides a reference-counted, type-aware flat memory buffer abstraction with support for shared ownership, slicing, type-view conversion, and CUDA memory copy/clear operations.

Description

The Buffer class is a 1D memory container that pairs a shared_ptr<void> data pointer with metadata: element count (size_), base offset (base_), data type (dtype_), and device location (device_). It supports multiple construction modes: empty, typed-reference (non-owning), shared-ownership, and allocator-backed. The view(dtype) method reinterprets the buffer as a different data type, adjusting element count and base offset accordingly. The slice(base, size) method creates a sub-buffer sharing the same underlying memory. borrow() creates a non-owning reference. The typed subclass Buffer_<T> provides compile-time type safety with begin()/end() iterators, operator[], and bounds-checked at(). Free functions Copy() and Clear() perform cudaMemcpyAsync and cudaMemsetAsync operations. Serialization support is provided through save/load template functions.

Usage

Used as the primary flat memory abstraction in TurboMind. Buffers are the storage backing for Tensor objects and are used directly for 1D data such as token IDs, attention masks, and intermediate results.

Code Reference

Source Location

Signature

namespace turbomind::core {

class Buffer {
public:
    Buffer();
    explicit Buffer(DataType dtype);
    template<class T> Buffer(T* data, ssize_t size, Device device);
    Buffer(void* data, ssize_t size, DataType dtype, Device device);
    Buffer(shared_ptr<void> data, ssize_t size, DataType dtype, Device device);
    Buffer(ssize_t size, DataType dtype, Allocator& alloc);
    Buffer(ssize_t size, DataType dtype, Device device);

    template<class T> T* data();
    template<class T> const T* data() const;
    void* raw_data(ssize_t offset = 0);
    DataType dtype() const;
    Device device() const;
    ssize_t size() const;
    ssize_t byte_size() const;
    explicit operator bool() const noexcept;

    Buffer view(DataType dtype) const;
    Buffer slice(ssize_t base, ssize_t size) const;
    Buffer borrow() const;
};

template<class T>
struct Buffer_ : public Buffer { /* typed wrapper with iterators */ };

void Copy(const Buffer& a, ssize_t n, Ref<Buffer> b_, const Stream& stream);
void Copy(const Buffer& a, Ref<Buffer> b_);
void Clear(Ref<Buffer> b_, const Stream& stream);
void Clear(Ref<Buffer> b_);

Buffer empty_like(const Buffer& buffer);
Buffer empty_like(const Buffer& buffer, Device device);
Buffer empty_like(const Buffer& buffer, DataType dtype);

}  // namespace turbomind::core

Import

#include "src/turbomind/core/buffer.h"

I/O Contract

Inputs

Name Type Required Description
size ssize_t Yes Number of elements in the buffer
dtype DataType Yes Element data type
device Device Conditional Device location (CPU, pinned, CUDA)
alloc Allocator& Conditional Allocator to use for memory; alternative to device
data void* or T* Conditional Pre-existing data pointer for reference-mode construction

Outputs

Name Type Description
data() T* or void* Pointer to the buffer's data, optionally typed
size() ssize_t Number of elements
byte_size() ssize_t Total bytes occupied
view() Buffer A reinterpreted-type view of the same memory
slice() Buffer A sub-range of the same memory

Usage Examples

#include "src/turbomind/core/buffer.h"

using namespace turbomind::core;

// Allocate a device buffer of 1024 float32 elements
Buffer buf(1024, kFloat32, kDEVICE);

// Access raw data pointer
void* ptr = buf.raw_data();

// Create a typed buffer
Buffer_<float> typed_buf(1024, kDEVICE);
float* fptr = typed_buf.data();

// Slice the first 256 elements
Buffer sub = buf.slice(0, 256);

// Copy between buffers
Buffer dst(1024, kFloat32, kDEVICE);
Copy(buf, dst);

// View as half precision (reinterpret element count)
Buffer half_view = buf.view(kFloat16);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment