Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Gemm Types

From Leeroopedia


Knowledge Sources
Domains GPU_Kernels, GEMM
Last Updated 2026-02-07 15:00 GMT

Overview

Core type definitions for the GEMM subsystem, including matrix order, MMA instruction tags, operand packing, striding modes, quantization descriptors, epilogue types, dispatch policies, and matrix layout structures.

Description

This header defines the foundational vocabulary types used throughout the GEMM framework:

  • Order: kColMajor / kRowMajor with complement operator ~
  • MMA_Tag: Encoded instruction classes -- HMMA_16816 (SM80+), HMMA_1688 (SM75), HMMA_884 (SM70), HMMA_SIMT (SM75-)
  • Op_Tag: Operand identifiers -- OPERAND_A through OPERAND_D
  • Pack: A uint32_t encoding MMA tag, operand tag, and pack number (extracted via get_mma_tag, get_operand_tag, get_pack_num)
  • Striding: Memory access modes -- kFlat (uniform), kRagged (variable lengths), kIndexed (indirect), kBlocked (contiguous per batch)
  • QuantType: Quantization axis -- kNone, kK, kM, kB
  • QuantDesc: Quantization type + group size pair
  • Epilogue: Post-MMA operations -- kNone, kChannelCombination, kGatedSilu
  • DispatchPolicy: kDefault, kMeasure, kReuse, kAppend
  • MatrixLayout: Full matrix description (DataType, Order, rows, cols, ld, pack, num, offsets, idxs)
  • Workspace: Barriers, partials, tensormaps, and flags buffers
  • Tape: Dynamic scheduler metadata (CTA counts, shapes, offsets, ranges, tile IDs)
  • Operation: Dispatch/epilogue/quantization configuration bundle

Usage

Included by virtually every file in the GEMM subsystem as the foundational type vocabulary.

Code Reference

Source Location

Signature

enum class Order : int { kColMajor = 0, kRowMajor = 1 };
enum class Striding : int { kFlat, kRagged, kIndexed, kBlocked };
enum class QuantType : int { kNone, kK, kM, kB };
enum class Epilogue : int { kNone, kChannelCombination, kGatedSilu };
enum class DispatchPolicy : int { kDefault, kMeasure, kReuse, kAppend };

struct MatrixLayout { DataType type; Order order; int rows, cols, ld; Pack pack; int num; int* offsets; int* idxs; };
struct Workspace { void* barriers; size_t barriers_size; void* partials; size_t partials_size; void* tensormaps; size_t tensormaps_size; int* flags; };
struct Tape { int ctas; int max_num; int max_ctas; char* buffer; int4* gemm_shapes; int4* tiled_shapes; int4* tile_offsets; int2* iter_k_ranges; int* tile_ids; };
struct Operation { DispatchPolicy dispatch; Epilogue epilogue; QuantDesc quant_a; QuantDesc quant_b; int batch_dim; };

Import

#include "src/turbomind/kernels/gemm/types.h"

I/O Contract

Inputs

Name Type Required Description
(type definitions) enums/structs N/A Foundational types, no runtime inputs

Outputs

Name Type Description
(type definitions) enums/structs Type vocabulary for the GEMM subsystem

Usage Examples

MatrixLayout desc{DataType::kHalf, kColMajor, M, K, lda, pack, 1, nullptr, nullptr};
Operation op{DispatchPolicy::kDefault, Epilogue::kNone, {QuantType::kK, 128}, {}, 0};

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment