Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Gemm SmemCopy

From Leeroopedia


Knowledge Sources
Domains GPU_Kernels, GEMM
Last Updated 2026-02-07 15:00 GMT

Overview

Provides shared memory accessor abstractions for reading from and writing to swizzled shared memory layouts, supporting both row-major and column-major access patterns.

Description

This header defines the shared memory copy primitives used by the GEMM epilogue and MMA operand loading stages:

  • VoidSmemCopyAtom: A no-op copy atom placeholder (1x1 dimension, empty copy method) used when no shared memory copy is needed for a given operand.
  • SmemAccessorV2<T, Layout, order>: A templated shared memory accessor that wraps the base SmemAccessor with order-aware indexing. For kRowMajor, it directly inherits from SmemAccessor. For kColMajor, it transposes the (m, k) indices, calling base_(k, m) to access column-major data stored in a row-major layout.

Additional specializations (lines 60+) handle architecture-specific shared memory copy atoms for HMMA 16816, HMMA 884, and SIMT operations, providing the data movement patterns required by each MMA instruction format.

Usage

Used by the epilogue to rearrange MMA accumulator data in shared memory and by mainloop iterators to load MMA operand fragments from shared memory.

Code Reference

Source Location

Signature

struct VoidSmemCopyAtom {
    static constexpr int M = 1, K = 1, kFragNum = 1;
    using Frag = Array<int, 1>;
    __device__ static void copy(S, D, bool);
};

template<class T, class Layout, Order order>
struct SmemAccessorV2;

// Specialization for kColMajor transposes m,k indexing
template<class T, class Layout>
struct SmemAccessorV2<T, Layout, kColMajor> {
    __device__ T& operator()(int m, int k);
};

Import

#include "src/turbomind/kernels/gemm/smem_copy.h"

I/O Contract

Inputs

Name Type Required Description
ptr pointer Yes Base shared memory pointer
m, k int Yes Row and column indices into the tile

Outputs

Name Type Description
T& reference Reference to the element at the swizzled shared memory location

Usage Examples

SmemAccessorV2<half, SmemLayout, kRowMajor> smem_C{storage.data()};
auto& val = smem_C(row, col);  // access swizzled shared memory

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment