Implementation:InternLM Lmdeploy Core SmemLayout

Knowledge Sources	InternLM_Lmdeploy
Domains	GPU_Kernels, Memory_Layout
Last Updated	2026-02-07 15:00 GMT

Overview

Shared memory layout abstractions with XOR-based swizzling to eliminate bank conflicts in CUDA kernels.

Description

This header defines the shared memory layout infrastructure for TurboMind. The Swizzle<Bits, Base, Shift> template implements XOR-based address swizzling that rearranges shared memory addresses to avoid bank conflicts during matrix tile operations. The Identity struct is a no-op swizzle for non-swizzled layouts. SmemLayoutV2<S, C, S0, C0, Swizzle> is the primary layout class that maps 2D (row, column) indices to linear shared memory offsets with optional tiling into (S0, C0) sub-tiles and swizzle application. SmemAccessor<T, Layout> wraps a pointer and a layout to provide convenient 2D indexing into shared memory. A Stride helper computes dot-product offsets for general 2D strided access patterns.

Usage

Use these layouts when allocating and indexing shared memory tiles in GEMM, attention, or other matrix-operation kernels to ensure conflict-free shared memory access patterns.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: src/turbomind/kernels/core/layout.h

Signature

template<int Bits, int Base, int Shift>
struct Swizzle {
    template<class Offset>
    __host__ __device__ constexpr static auto apply(Offset offset);
};

struct Identity;

template<int S_, int C_, int S0_ = -1, int C0_ = -1, class Swizzle_ = Identity>
struct SmemLayoutV2 {
    static constexpr int kSize = S * C;
    __forceinline__ __device__ static int apply(int s, int c, int offset = 0);
};

template<class T, class Layout>
struct SmemAccessor {
    __device__ T& operator()(int s, int c);
};

Import

#include "src/turbomind/kernels/core/layout.h"

I/O Contract

Inputs

Name	Type	Required	Description
S, C	int	Yes	Shared memory tile dimensions (rows, columns)
S0, C0	int	No	Sub-tile dimensions for tiled layout (defaults to S, C)
Swizzle_	type	No	Swizzle policy (defaults to Identity)
s, c	int	Yes	Row and column indices for offset computation

Outputs

Name	Type	Description
apply return	int	Linear shared memory offset after swizzle
SmemAccessor::operator()	T&	Reference to the element at (s, c)

Usage Examples

using namespace turbomind;

// Define a 64x64 shared memory layout with 3-bit swizzle
using Layout = SmemLayoutV2<64, 64, 8, 64, Swizzle<3, 3, 3>>;

__shared__ half smem[Layout::kSize];
SmemAccessor<half, Layout> acc(smem);

// Access element at row 5, column 10
half val = acc(5, 10);

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment