Implementation:InternLM Lmdeploy Gemm Utils
| Knowledge Sources | |
|---|---|
| Domains | GPU_Kernels, GEMM |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Provides GEMM-specific utility functions including matrix order transposition, coordinate transformations between (m,k) and (contiguous,strided) spaces, index linearization, packing transformations, and dot product helpers.
Description
This header contains device/host utility functions for GEMM coordinate arithmetic:
transpose(Order)/transpose(MatrixLayout): Swaps row/column order and dimensionsmk2cs<order>(m, k): Converts (M, K) coordinates to (contiguous, strided) based on matrix ordercs2mk<order>(c, s): Inverse of mk2cscs2idx(cs, ld): Linearizes (C, S) coordinates to a flat index:ld * cs.y + cs.xdot(int2, int2)/dot(int2, long2): 2D dot product for offset calculationPacking_v2<pack, order>: Applies packing transformations for different MMA instruction formats (HMMA_16816, HMMA_884, HMMA_SIMT) to convert logical (M,K) dimensions into packed layout dimensions
All functions are marked __host__ __device__ constexpr for use in both host-side setup and device-side computation.
Usage
Used throughout the GEMM kernel infrastructure for coordinate transformations and memory offset calculations.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/kernels/gemm/utils.h
Signature
__host__ __device__ constexpr Order transpose(Order order);
__host__ __device__ constexpr MatrixLayout transpose(MatrixLayout x);
template<Order order>
__host__ __device__ constexpr int2 mk2cs(int m, int k);
template<Order order>
__host__ __device__ constexpr int2 cs2mk(int c, int s);
template<class Index>
__host__ __device__ constexpr Index cs2idx(int2 cs, Index ld);
__host__ __device__ constexpr auto dot(int2 a, int2 b);
template<Pack pack, Order order>
struct Packing_v2 { __host__ __device__ static constexpr int2 apply(int2 mk); };
Import
#include "src/turbomind/kernels/gemm/utils.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| m, k | int | Yes | Logical matrix coordinates |
| order | Order (template) | Yes | Row or column major |
| pack | Pack (template) | For packing | MMA+operand+num encoding |
Outputs
| Name | Type | Description |
|---|---|---|
| int2 | coordinate pair | Transformed coordinates in target space |
| Index | scalar | Linearized memory offset |
Usage Examples
auto cs = mk2cs<kColMajor>(m, k); // {m, k} for col-major
auto idx = cs2idx(cs, ld); // ld * k + m
auto packed = Packing_v2<pack, kRowMajor>::apply({M, K});