Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Gemm Utils

From Leeroopedia


Knowledge Sources
Domains GPU_Kernels, GEMM
Last Updated 2026-02-07 15:00 GMT

Overview

Provides GEMM-specific utility functions including matrix order transposition, coordinate transformations between (m,k) and (contiguous,strided) spaces, index linearization, packing transformations, and dot product helpers.

Description

This header contains device/host utility functions for GEMM coordinate arithmetic:

  • transpose(Order) / transpose(MatrixLayout): Swaps row/column order and dimensions
  • mk2cs<order>(m, k): Converts (M, K) coordinates to (contiguous, strided) based on matrix order
  • cs2mk<order>(c, s): Inverse of mk2cs
  • cs2idx(cs, ld): Linearizes (C, S) coordinates to a flat index: ld * cs.y + cs.x
  • dot(int2, int2) / dot(int2, long2): 2D dot product for offset calculation
  • Packing_v2<pack, order>: Applies packing transformations for different MMA instruction formats (HMMA_16816, HMMA_884, HMMA_SIMT) to convert logical (M,K) dimensions into packed layout dimensions

All functions are marked __host__ __device__ constexpr for use in both host-side setup and device-side computation.

Usage

Used throughout the GEMM kernel infrastructure for coordinate transformations and memory offset calculations.

Code Reference

Source Location

Signature

__host__ __device__ constexpr Order transpose(Order order);
__host__ __device__ constexpr MatrixLayout transpose(MatrixLayout x);

template<Order order>
__host__ __device__ constexpr int2 mk2cs(int m, int k);

template<Order order>
__host__ __device__ constexpr int2 cs2mk(int c, int s);

template<class Index>
__host__ __device__ constexpr Index cs2idx(int2 cs, Index ld);

__host__ __device__ constexpr auto dot(int2 a, int2 b);

template<Pack pack, Order order>
struct Packing_v2 { __host__ __device__ static constexpr int2 apply(int2 mk); };

Import

#include "src/turbomind/kernels/gemm/utils.h"

I/O Contract

Inputs

Name Type Required Description
m, k int Yes Logical matrix coordinates
order Order (template) Yes Row or column major
pack Pack (template) For packing MMA+operand+num encoding

Outputs

Name Type Description
int2 coordinate pair Transformed coordinates in target space
Index scalar Linearized memory offset

Usage Examples

auto cs = mk2cs<kColMajor>(m, k);  // {m, k} for col-major
auto idx = cs2idx(cs, ld);          // ld * k + m
auto packed = Packing_v2<pack, kRowMajor>::apply({M, K});

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment