Implementation:InternLM Lmdeploy Core SmemLayout
| Knowledge Sources | |
|---|---|
| Domains | GPU_Kernels, Memory_Layout |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Shared memory layout abstractions with XOR-based swizzling to eliminate bank conflicts in CUDA kernels.
Description
This header defines the shared memory layout infrastructure for TurboMind. The Swizzle<Bits, Base, Shift> template implements XOR-based address swizzling that rearranges shared memory addresses to avoid bank conflicts during matrix tile operations. The Identity struct is a no-op swizzle for non-swizzled layouts. SmemLayoutV2<S, C, S0, C0, Swizzle> is the primary layout class that maps 2D (row, column) indices to linear shared memory offsets with optional tiling into (S0, C0) sub-tiles and swizzle application. SmemAccessor<T, Layout> wraps a pointer and a layout to provide convenient 2D indexing into shared memory. A Stride helper computes dot-product offsets for general 2D strided access patterns.
Usage
Use these layouts when allocating and indexing shared memory tiles in GEMM, attention, or other matrix-operation kernels to ensure conflict-free shared memory access patterns.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/kernels/core/layout.h
Signature
template<int Bits, int Base, int Shift>
struct Swizzle {
template<class Offset>
__host__ __device__ constexpr static auto apply(Offset offset);
};
struct Identity;
template<int S_, int C_, int S0_ = -1, int C0_ = -1, class Swizzle_ = Identity>
struct SmemLayoutV2 {
static constexpr int kSize = S * C;
__forceinline__ __device__ static int apply(int s, int c, int offset = 0);
};
template<class T, class Layout>
struct SmemAccessor {
__device__ T& operator()(int s, int c);
};
Import
#include "src/turbomind/kernels/core/layout.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| S, C | int | Yes | Shared memory tile dimensions (rows, columns) |
| S0, C0 | int | No | Sub-tile dimensions for tiled layout (defaults to S, C) |
| Swizzle_ | type | No | Swizzle policy (defaults to Identity) |
| s, c | int | Yes | Row and column indices for offset computation |
Outputs
| Name | Type | Description |
|---|---|---|
| apply return | int | Linear shared memory offset after swizzle |
| SmemAccessor::operator() | T& | Reference to the element at (s, c) |
Usage Examples
using namespace turbomind;
// Define a 64x64 shared memory layout with 3-bit swizzle
using Layout = SmemLayoutV2<64, 64, 8, 64, Swizzle<3, 3, 3>>;
__shared__ half smem[Layout::kSize];
SmemAccessor<half, Layout> acc(smem);
// Access element at row 5, column 10
half val = acc(5, 10);