Implementation:InternLM Lmdeploy Gemm SmemCopy
| Knowledge Sources | |
|---|---|
| Domains | GPU_Kernels, GEMM |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Provides shared memory accessor abstractions for reading from and writing to swizzled shared memory layouts, supporting both row-major and column-major access patterns.
Description
This header defines the shared memory copy primitives used by the GEMM epilogue and MMA operand loading stages:
VoidSmemCopyAtom: A no-op copy atom placeholder (1x1 dimension, empty copy method) used when no shared memory copy is needed for a given operand.
SmemAccessorV2<T, Layout, order>: A templated shared memory accessor that wraps the baseSmemAccessorwith order-aware indexing. ForkRowMajor, it directly inherits fromSmemAccessor. ForkColMajor, it transposes the (m, k) indices, callingbase_(k, m)to access column-major data stored in a row-major layout.
Additional specializations (lines 60+) handle architecture-specific shared memory copy atoms for HMMA 16816, HMMA 884, and SIMT operations, providing the data movement patterns required by each MMA instruction format.
Usage
Used by the epilogue to rearrange MMA accumulator data in shared memory and by mainloop iterators to load MMA operand fragments from shared memory.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/kernels/gemm/smem_copy.h
Signature
struct VoidSmemCopyAtom {
static constexpr int M = 1, K = 1, kFragNum = 1;
using Frag = Array<int, 1>;
__device__ static void copy(S, D, bool);
};
template<class T, class Layout, Order order>
struct SmemAccessorV2;
// Specialization for kColMajor transposes m,k indexing
template<class T, class Layout>
struct SmemAccessorV2<T, Layout, kColMajor> {
__device__ T& operator()(int m, int k);
};
Import
#include "src/turbomind/kernels/gemm/smem_copy.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ptr | pointer | Yes | Base shared memory pointer |
| m, k | int | Yes | Row and column indices into the tile |
Outputs
| Name | Type | Description |
|---|---|---|
| T& | reference | Reference to the element at the swizzled shared memory location |
Usage Examples
SmemAccessorV2<half, SmemLayout, kRowMajor> smem_C{storage.data()};
auto& val = smem_C(row, col); // access swizzled shared memory