Implementation:InternLM Lmdeploy AttentionIterator
| Knowledge Sources | |
|---|---|
| Domains | GPU_Kernels, Attention |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Base iterator classes for loading KV cache tiles from global memory to shared memory, including thread-map-based offset computation and a combiner for pairing data and quantization parameter iterators.
Description
BaseGmemIterator establishes the common pattern for global-to-shared-memory KV tile loading. It uses a Map type to compute per-thread (offset_s, offset_c) coordinates and provides SetSmem to bind a shared memory target and ClearSmem to zero-initialize shared memory tiles. BaseSmemIterator wraps a shared memory pointer with a layout accessor. CombinedIterator composes two iterators (typically data + quantization parameters) into a single unit with unified Prefetch, Load, Save, and ClearSmem interfaces.
Usage
BaseGmemIterator is the base for architecture-specific iterators (Sm70GmemIterator, Sm80GmemIterator). CombinedIterator is used when KV quantization is enabled to pair data and parameter loading.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/kernels/attention/iterator.h
- Lines: 1-127
Signature
template<class T, class Map, class SmemLayout>
struct BaseGmemIterator {
using ElementType = T;
using AccessType = Array<T, Map::kAccessC>;
static constexpr int kIterCount = Map::kIterS * Map::kIterC;
using Fragment = Array<T, Map::kAccessC>[Map::kIterS][Map::kIterC];
__device__ BaseGmemIterator();
__device__ void SetSmem(Pointer smem);
__device__ void ClearSmem(int pipe_iter = 0);
};
template<class T, class Layout>
struct BaseSmemIterator {
__device__ explicit BaseSmemIterator(T* smem);
};
template<class Iterator0, class Iterator1>
struct CombinedIterator {
struct Fragment { ... };
template<typename... Args>
__device__ void Prefetch(Args... args);
template<bool is_residue, class CacheIter>
__device__ void Load(const CacheIter&, Fragment&, int max_s);
__device__ void Save(const Fragment&);
__device__ void ClearSmem(int pipe_iter = 0);
template<class P0, class P1>
__device__ void SetSmem(P0, P1);
};
Import
#include "src/turbomind/kernels/attention/iterator.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| T | typename | Yes | Element type for the iterator |
| Map | typename | Yes | Thread mapping type (e.g., RakedThreadMap) |
| SmemLayout | typename | Yes | Shared memory layout with swizzle |
Outputs
| Name | Type | Description |
|---|---|---|
| Fragment | Array<T,...>[IterS][IterC] | Loaded KV tile data in registers |
| smem_ | Pointer | Shared memory region written during Save/ClearSmem |
Usage Examples
// Composed into architecture-specific iterators:
using GmemIterK = Sm80GmemIterator<half, ThreadMapKV, SmemLayoutK, 0>;
using GmemIterV = Sm80GmemIterator<half, ThreadMapKV, SmemLayoutV, 1>;
// With quantization:
using CombinedK = CombinedIterator<GmemIterK, Sm80GmemIterator<half, ThreadMapKVp, SmemLayoutKVp, 2>>;