Implementation:InternLM Lmdeploy Gemm Desc
Appearance
| Knowledge Sources | |
|---|---|
| Domains | GPU_Kernels, GEMM |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Defines the descriptor aggregates (GemmDesc, KernelDesc, KernelInfo, LaunchSpec) that uniquely identify GEMM problems and kernel configurations for dispatch and tuning.
Description
This header establishes the metadata model for the GEMM subsystem:
- GemmDesc: A trivially-copyable aggregate capturing everything needed to identify a GEMM problem -- architecture, data types for A/B/C, matrix orders, striding modes, packing descriptors, quantization descriptors, epilogue type, batch dimension, and M/N/K/num dimensions. A
transposefunction swaps A/B descriptors for transposed dispatch.
- KernelDesc: Extends the problem description with kernel-specific attributes -- operation class (SIMT, MMA s884, MMA s16816, GMMA s64n16), CTA tile size, MMA tile size, cluster shape, alignment requirements, pipeline stages, split-K capability, and cache eviction policies.
- KernelInfo: Runtime attributes including dynamic shared memory size, max active CTAs, chunk size for K, kernel name, and CUDA function attributes.
- LaunchSpec: Associates a
Kernel*with swizzle factor, split count, measured latency, and estimated cost for dispatch selection.
- OpClass enum:
kSIMT,kMMA_s884,kMMA_s16816,kGMMA_s64n16.
Usage
Used throughout the GEMM framework for kernel registration, feasibility filtering, dispatch caching, and performance measurement.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/kernels/gemm/desc.h
Signature
struct GemmDesc {
int arch; DataType type_a, type_b, type_c;
Order order_a, order_b, order_c;
Striding striding_a, striding_b, striding_c;
Pack pack_a, pack_b, pack_u, pack_v;
QuantDesc quant_a, quant_b;
Epilogue epilogue; int batch_dim, group_axis;
int m, n, k, num;
};
struct KernelDesc {
int arch; OpClass op_class;
/* types, orders, striding, packing, quant same as GemmDesc */
int3 cta_tile, mma_tile; int2 cluster_shape;
int3 align; int2 c_tile;
int stages; bool split_k; int group_axis;
};
struct LaunchSpec { Kernel* kernel; int swizzle, splits; float measured; };
Import
#include "src/turbomind/kernels/gemm/desc.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| (constructor fields) | various | Yes | Problem and kernel configuration parameters |
Outputs
| Name | Type | Description |
|---|---|---|
| GemmDesc | struct | Uniquely identifies a GEMM problem for dispatch lookup |
| KernelDesc | struct | Uniquely identifies a kernel variant |
| LaunchSpec | struct | Kernel pointer with launch configuration and performance data |
Usage Examples
// Check if a kernel can handle the described problem
GemmDesc desc{...};
if (kernel->is_feasible(desc)) {
LaunchSpec spec{kernel, swizzle, splits, 0.f, {}};
}
// Transpose a GemmDesc for B-major dispatch
auto desc_t = transpose(desc);
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment