Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Core SmemLayout

From Leeroopedia


Knowledge Sources
Domains GPU_Kernels, Memory_Layout
Last Updated 2026-02-07 15:00 GMT

Overview

Shared memory layout abstractions with XOR-based swizzling to eliminate bank conflicts in CUDA kernels.

Description

This header defines the shared memory layout infrastructure for TurboMind. The Swizzle<Bits, Base, Shift> template implements XOR-based address swizzling that rearranges shared memory addresses to avoid bank conflicts during matrix tile operations. The Identity struct is a no-op swizzle for non-swizzled layouts. SmemLayoutV2<S, C, S0, C0, Swizzle> is the primary layout class that maps 2D (row, column) indices to linear shared memory offsets with optional tiling into (S0, C0) sub-tiles and swizzle application. SmemAccessor<T, Layout> wraps a pointer and a layout to provide convenient 2D indexing into shared memory. A Stride helper computes dot-product offsets for general 2D strided access patterns.

Usage

Use these layouts when allocating and indexing shared memory tiles in GEMM, attention, or other matrix-operation kernels to ensure conflict-free shared memory access patterns.

Code Reference

Source Location

Signature

template<int Bits, int Base, int Shift>
struct Swizzle {
    template<class Offset>
    __host__ __device__ constexpr static auto apply(Offset offset);
};

struct Identity;

template<int S_, int C_, int S0_ = -1, int C0_ = -1, class Swizzle_ = Identity>
struct SmemLayoutV2 {
    static constexpr int kSize = S * C;
    __forceinline__ __device__ static int apply(int s, int c, int offset = 0);
};

template<class T, class Layout>
struct SmemAccessor {
    __device__ T& operator()(int s, int c);
};

Import

#include "src/turbomind/kernels/core/layout.h"

I/O Contract

Inputs

Name Type Required Description
S, C int Yes Shared memory tile dimensions (rows, columns)
S0, C0 int No Sub-tile dimensions for tiled layout (defaults to S, C)
Swizzle_ type No Swizzle policy (defaults to Identity)
s, c int Yes Row and column indices for offset computation

Outputs

Name Type Description
apply return int Linear shared memory offset after swizzle
SmemAccessor::operator() T& Reference to the element at (s, c)

Usage Examples

using namespace turbomind;

// Define a 64x64 shared memory layout with 3-bit swizzle
using Layout = SmemLayoutV2<64, 64, 8, 64, Swizzle<3, 3, 3>>;

__shared__ half smem[Layout::kSize];
SmemAccessor<half, Layout> acc(smem);

// Access element at row 5, column 10
half val = acc(5, 10);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment