Implementation:InternLM Lmdeploy Gemm Utils

Knowledge Sources	InternLM_Lmdeploy
Domains	GPU_Kernels, GEMM
Last Updated	2026-02-07 15:00 GMT

Overview

Provides GEMM-specific utility functions including matrix order transposition, coordinate transformations between (m,k) and (contiguous,strided) spaces, index linearization, packing transformations, and dot product helpers.

Description

This header contains device/host utility functions for GEMM coordinate arithmetic:

transpose(Order) / transpose(MatrixLayout): Swaps row/column order and dimensions
mk2cs<order>(m, k): Converts (M, K) coordinates to (contiguous, strided) based on matrix order
cs2mk<order>(c, s): Inverse of mk2cs
cs2idx(cs, ld): Linearizes (C, S) coordinates to a flat index: ld * cs.y + cs.x
dot(int2, int2) / dot(int2, long2): 2D dot product for offset calculation
Packing_v2<pack, order>: Applies packing transformations for different MMA instruction formats (HMMA_16816, HMMA_884, HMMA_SIMT) to convert logical (M,K) dimensions into packed layout dimensions

All functions are marked __host__ __device__ constexpr for use in both host-side setup and device-side computation.

Usage

Used throughout the GEMM kernel infrastructure for coordinate transformations and memory offset calculations.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: src/turbomind/kernels/gemm/utils.h

Signature

__host__ __device__ constexpr Order transpose(Order order);
__host__ __device__ constexpr MatrixLayout transpose(MatrixLayout x);

template<Order order>
__host__ __device__ constexpr int2 mk2cs(int m, int k);

template<Order order>
__host__ __device__ constexpr int2 cs2mk(int c, int s);

template<class Index>
__host__ __device__ constexpr Index cs2idx(int2 cs, Index ld);

__host__ __device__ constexpr auto dot(int2 a, int2 b);

template<Pack pack, Order order>
struct Packing_v2 { __host__ __device__ static constexpr int2 apply(int2 mk); };

Import

#include "src/turbomind/kernels/gemm/utils.h"

I/O Contract

Inputs

Name	Type	Required	Description
m, k	int	Yes	Logical matrix coordinates
order	Order (template)	Yes	Row or column major
pack	Pack (template)	For packing	MMA+operand+num encoding

Outputs

Name	Type	Description
int2	coordinate pair	Transformed coordinates in target space
Index	scalar	Linearized memory offset

Usage Examples

auto cs = mk2cs<kColMajor>(m, k);  // {m, k} for col-major
auto idx = cs2idx(cs, ld);          // ld * k + m
auto packed = Packing_v2<pack, kRowMajor>::apply({M, K});

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment