Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy AttentionIterator

From Leeroopedia


Knowledge Sources
Domains GPU_Kernels, Attention
Last Updated 2026-02-07 15:00 GMT

Overview

Base iterator classes for loading KV cache tiles from global memory to shared memory, including thread-map-based offset computation and a combiner for pairing data and quantization parameter iterators.

Description

BaseGmemIterator establishes the common pattern for global-to-shared-memory KV tile loading. It uses a Map type to compute per-thread (offset_s, offset_c) coordinates and provides SetSmem to bind a shared memory target and ClearSmem to zero-initialize shared memory tiles. BaseSmemIterator wraps a shared memory pointer with a layout accessor. CombinedIterator composes two iterators (typically data + quantization parameters) into a single unit with unified Prefetch, Load, Save, and ClearSmem interfaces.

Usage

BaseGmemIterator is the base for architecture-specific iterators (Sm70GmemIterator, Sm80GmemIterator). CombinedIterator is used when KV quantization is enabled to pair data and parameter loading.

Code Reference

Source Location

Signature

template<class T, class Map, class SmemLayout>
struct BaseGmemIterator {
    using ElementType = T;
    using AccessType  = Array<T, Map::kAccessC>;
    static constexpr int kIterCount = Map::kIterS * Map::kIterC;
    using Fragment = Array<T, Map::kAccessC>[Map::kIterS][Map::kIterC];

    __device__ BaseGmemIterator();
    __device__ void SetSmem(Pointer smem);
    __device__ void ClearSmem(int pipe_iter = 0);
};

template<class T, class Layout>
struct BaseSmemIterator {
    __device__ explicit BaseSmemIterator(T* smem);
};

template<class Iterator0, class Iterator1>
struct CombinedIterator {
    struct Fragment { ... };
    template<typename... Args>
    __device__ void Prefetch(Args... args);
    template<bool is_residue, class CacheIter>
    __device__ void Load(const CacheIter&, Fragment&, int max_s);
    __device__ void Save(const Fragment&);
    __device__ void ClearSmem(int pipe_iter = 0);
    template<class P0, class P1>
    __device__ void SetSmem(P0, P1);
};

Import

#include "src/turbomind/kernels/attention/iterator.h"

I/O Contract

Inputs

Name Type Required Description
T typename Yes Element type for the iterator
Map typename Yes Thread mapping type (e.g., RakedThreadMap)
SmemLayout typename Yes Shared memory layout with swizzle

Outputs

Name Type Description
Fragment Array<T,...>[IterS][IterC] Loaded KV tile data in registers
smem_ Pointer Shared memory region written during Save/ClearSmem

Usage Examples

// Composed into architecture-specific iterators:
using GmemIterK = Sm80GmemIterator<half, ThreadMapKV, SmemLayoutK, 0>;
using GmemIterV = Sm80GmemIterator<half, ThreadMapKV, SmemLayoutV, 1>;
// With quantization:
using CombinedK = CombinedIterator<GmemIterK, Sm80GmemIterator<half, ThreadMapKVp, SmemLayoutKVp, 2>>;

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment