Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Gemm Desc

From Leeroopedia


Knowledge Sources
Domains GPU_Kernels, GEMM
Last Updated 2026-02-07 15:00 GMT

Overview

Defines the descriptor aggregates (GemmDesc, KernelDesc, KernelInfo, LaunchSpec) that uniquely identify GEMM problems and kernel configurations for dispatch and tuning.

Description

This header establishes the metadata model for the GEMM subsystem:

  • GemmDesc: A trivially-copyable aggregate capturing everything needed to identify a GEMM problem -- architecture, data types for A/B/C, matrix orders, striding modes, packing descriptors, quantization descriptors, epilogue type, batch dimension, and M/N/K/num dimensions. A transpose function swaps A/B descriptors for transposed dispatch.
  • KernelDesc: Extends the problem description with kernel-specific attributes -- operation class (SIMT, MMA s884, MMA s16816, GMMA s64n16), CTA tile size, MMA tile size, cluster shape, alignment requirements, pipeline stages, split-K capability, and cache eviction policies.
  • KernelInfo: Runtime attributes including dynamic shared memory size, max active CTAs, chunk size for K, kernel name, and CUDA function attributes.
  • LaunchSpec: Associates a Kernel* with swizzle factor, split count, measured latency, and estimated cost for dispatch selection.
  • OpClass enum: kSIMT, kMMA_s884, kMMA_s16816, kGMMA_s64n16.

Usage

Used throughout the GEMM framework for kernel registration, feasibility filtering, dispatch caching, and performance measurement.

Code Reference

Source Location

Signature

struct GemmDesc {
    int arch; DataType type_a, type_b, type_c;
    Order order_a, order_b, order_c;
    Striding striding_a, striding_b, striding_c;
    Pack pack_a, pack_b, pack_u, pack_v;
    QuantDesc quant_a, quant_b;
    Epilogue epilogue; int batch_dim, group_axis;
    int m, n, k, num;
};

struct KernelDesc {
    int arch; OpClass op_class;
    /* types, orders, striding, packing, quant same as GemmDesc */
    int3 cta_tile, mma_tile; int2 cluster_shape;
    int3 align; int2 c_tile;
    int stages; bool split_k; int group_axis;
};

struct LaunchSpec { Kernel* kernel; int swizzle, splits; float measured; };

Import

#include "src/turbomind/kernels/gemm/desc.h"

I/O Contract

Inputs

Name Type Required Description
(constructor fields) various Yes Problem and kernel configuration parameters

Outputs

Name Type Description
GemmDesc struct Uniquely identifies a GEMM problem for dispatch lookup
KernelDesc struct Uniquely identifies a kernel variant
LaunchSpec struct Kernel pointer with launch configuration and performance data

Usage Examples

// Check if a kernel can handle the described problem
GemmDesc desc{...};
if (kernel->is_feasible(desc)) {
    LaunchSpec spec{kernel, swizzle, splits, 0.f, {}};
}

// Transpose a GemmDesc for B-major dispatch
auto desc_t = transpose(desc);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment