Implementation:InternLM Lmdeploy Gemm Desc

Knowledge Sources	InternLM_Lmdeploy
Domains	GPU_Kernels, GEMM
Last Updated	2026-02-07 15:00 GMT

Overview

Defines the descriptor aggregates (GemmDesc, KernelDesc, KernelInfo, LaunchSpec) that uniquely identify GEMM problems and kernel configurations for dispatch and tuning.

Description

This header establishes the metadata model for the GEMM subsystem:

GemmDesc: A trivially-copyable aggregate capturing everything needed to identify a GEMM problem -- architecture, data types for A/B/C, matrix orders, striding modes, packing descriptors, quantization descriptors, epilogue type, batch dimension, and M/N/K/num dimensions. A transpose function swaps A/B descriptors for transposed dispatch.

KernelDesc: Extends the problem description with kernel-specific attributes -- operation class (SIMT, MMA s884, MMA s16816, GMMA s64n16), CTA tile size, MMA tile size, cluster shape, alignment requirements, pipeline stages, split-K capability, and cache eviction policies.

KernelInfo: Runtime attributes including dynamic shared memory size, max active CTAs, chunk size for K, kernel name, and CUDA function attributes.

LaunchSpec: Associates a Kernel* with swizzle factor, split count, measured latency, and estimated cost for dispatch selection.

OpClass enum: kSIMT, kMMA_s884, kMMA_s16816, kGMMA_s64n16.

Usage

Used throughout the GEMM framework for kernel registration, feasibility filtering, dispatch caching, and performance measurement.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: src/turbomind/kernels/gemm/desc.h

Signature

struct GemmDesc {
    int arch; DataType type_a, type_b, type_c;
    Order order_a, order_b, order_c;
    Striding striding_a, striding_b, striding_c;
    Pack pack_a, pack_b, pack_u, pack_v;
    QuantDesc quant_a, quant_b;
    Epilogue epilogue; int batch_dim, group_axis;
    int m, n, k, num;
};

struct KernelDesc {
    int arch; OpClass op_class;
    /* types, orders, striding, packing, quant same as GemmDesc */
    int3 cta_tile, mma_tile; int2 cluster_shape;
    int3 align; int2 c_tile;
    int stages; bool split_k; int group_axis;
};

struct LaunchSpec { Kernel* kernel; int swizzle, splits; float measured; };

Import

#include "src/turbomind/kernels/gemm/desc.h"

I/O Contract

Inputs

Name	Type	Required	Description
(constructor fields)	various	Yes	Problem and kernel configuration parameters

Outputs

Name	Type	Description
GemmDesc	struct	Uniquely identifies a GEMM problem for dispatch lookup
KernelDesc	struct	Uniquely identifies a kernel variant
LaunchSpec	struct	Kernel pointer with launch configuration and performance data

Usage Examples

// Check if a kernel can handle the described problem
GemmDesc desc{...};
if (kernel->is_feasible(desc)) {
    LaunchSpec spec{kernel, swizzle, splits, 0.f, {}};
}

// Transpose a GemmDesc for B-major dispatch
auto desc_t = transpose(desc);

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment