Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy State

From Leeroopedia
Revision as of 15:16, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/InternLM_Lmdeploy_State.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Tensor_Operations, KV_Cache
Last Updated 2026-02-07 15:00 GMT

Overview

Provides a double-buffered tensor state container and permutation-based warp/append functions for efficient KV-cache and sequence state management during inference.

Description

The State struct holds two tensors (data_[2]) that serve as a double buffer for ping-pong style state management. front() and back() access the two buffers, and Swap() exchanges them. This enables constant-time state updates: write to the back buffer while reading from the front, then swap.

The file also provides several template Warp function overloads and an Append function that perform permutation-based data rearrangement using a caller-supplied copy functor:

  • Warp(a0, size0, perm, b1, copy) -- copies rows from source tensor a0 to destination b1 according to permutation perm
  • Warp(a0, b1, size0, perm, c1, copy) -- selects between two sources based on whether perm[i] < size0
  • Warp with variable-size offset arrays -- handles variable-length data with offset indexing
  • Append -- merges existing state with new tokens, handling variable-size rows with stride-based layout

These are designed for minimal cudaMemcpy/kernel launches and single-stream operation.

Usage

Used for managing per-sequence KV-cache states during continuous batching. When sequences are reordered, added, or removed between iterations, the Warp/Append functions efficiently rearrange state tensors according to the new permutation without redundant copies.

Code Reference

Source Location

Signature

namespace turbomind {

struct State {
    Tensor data_[2];

    State() = default;
    State(const Layout& layout, DataType dtype, const core::Device& device);

    Tensor& front();
    Tensor& back();
    void Swap();
};

template<class Copy>
void Warp(const Tensor& a0, int size0, const Buffer_<int>& perm,
          Tensor b1, Copy& copy);

template<class Copy>
void Warp(const Tensor& a0, const Tensor& b1, int size0,
          const Buffer_<int>& perm, Tensor c1, Copy& copy);

template<class Copy>
void Warp(const Tensor& src0, const Buffer_<int>& offset0, int size0,
          const Tensor& src1, const Buffer_<int>& offset1,
          const Buffer_<int>& perm0, Tensor dst, Buffer_<int> offsetd,
          Copy& copy);

template<class Copy>
void Append(const Tensor& a0, const Buffer_<int>& a0_size,
            const Tensor& b0, const Tensor& c1,
            const Buffer_<int>& c1_offset, const Buffer_<int>& perm,
            int size0, Tensor d1, Buffer_<int> d1_size, Copy& copy);

}  // namespace turbomind

Import

#include "src/turbomind/core/state.h"

I/O Contract

Inputs

Name Type Required Description
layout const Layout& State ctor Shape descriptor for both buffers
dtype DataType State ctor Element data type
device const core::Device& State ctor Device placement
perm const Buffer_<int>& Warp/Append Permutation indices mapping output positions to input positions
size0 int Warp/Append Size of the "old" source, used to distinguish old vs new data
copy Copy& Warp/Append Copy functor (e.g., BatchCopy)

Outputs

Name Type Description
front() Tensor& The current front buffer
back() Tensor& The current back buffer
(side effect) Tensor Destination tensor populated by Warp/Append

Usage Examples

#include "src/turbomind/core/state.h"

using namespace turbomind;

// Create double-buffered state for 32 sequences, 128 hidden dim
State kv_state(Layout({32, 128}), kFloat16, core::Device(kDEVICE));

// Access current and next buffers
Tensor& current = kv_state.front();
Tensor& next    = kv_state.back();

// Rearrange state according to permutation
BatchCopy copy;
Warp(current, old_size, perm, next, copy);
copy.Run();

kv_state.Swap();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment