Implementation:InternLM Lmdeploy Gemm Types
Appearance
| Knowledge Sources | |
|---|---|
| Domains | GPU_Kernels, GEMM |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Core type definitions for the GEMM subsystem, including matrix order, MMA instruction tags, operand packing, striding modes, quantization descriptors, epilogue types, dispatch policies, and matrix layout structures.
Description
This header defines the foundational vocabulary types used throughout the GEMM framework:
Order:kColMajor/kRowMajorwith complement operator~MMA_Tag: Encoded instruction classes --HMMA_16816(SM80+),HMMA_1688(SM75),HMMA_884(SM70),HMMA_SIMT(SM75-)Op_Tag: Operand identifiers --OPERAND_AthroughOPERAND_DPack: A uint32_t encoding MMA tag, operand tag, and pack number (extracted viaget_mma_tag,get_operand_tag,get_pack_num)Striding: Memory access modes --kFlat(uniform),kRagged(variable lengths),kIndexed(indirect),kBlocked(contiguous per batch)QuantType: Quantization axis --kNone,kK,kM,kBQuantDesc: Quantization type + group size pairEpilogue: Post-MMA operations --kNone,kChannelCombination,kGatedSiluDispatchPolicy:kDefault,kMeasure,kReuse,kAppendMatrixLayout: Full matrix description (DataType, Order, rows, cols, ld, pack, num, offsets, idxs)Workspace: Barriers, partials, tensormaps, and flags buffersTape: Dynamic scheduler metadata (CTA counts, shapes, offsets, ranges, tile IDs)Operation: Dispatch/epilogue/quantization configuration bundle
Usage
Included by virtually every file in the GEMM subsystem as the foundational type vocabulary.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/kernels/gemm/types.h
Signature
enum class Order : int { kColMajor = 0, kRowMajor = 1 };
enum class Striding : int { kFlat, kRagged, kIndexed, kBlocked };
enum class QuantType : int { kNone, kK, kM, kB };
enum class Epilogue : int { kNone, kChannelCombination, kGatedSilu };
enum class DispatchPolicy : int { kDefault, kMeasure, kReuse, kAppend };
struct MatrixLayout { DataType type; Order order; int rows, cols, ld; Pack pack; int num; int* offsets; int* idxs; };
struct Workspace { void* barriers; size_t barriers_size; void* partials; size_t partials_size; void* tensormaps; size_t tensormaps_size; int* flags; };
struct Tape { int ctas; int max_num; int max_ctas; char* buffer; int4* gemm_shapes; int4* tiled_shapes; int4* tile_offsets; int2* iter_k_ranges; int* tile_ids; };
struct Operation { DispatchPolicy dispatch; Epilogue epilogue; QuantDesc quant_a; QuantDesc quant_b; int batch_dim; };
Import
#include "src/turbomind/kernels/gemm/types.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| (type definitions) | enums/structs | N/A | Foundational types, no runtime inputs |
Outputs
| Name | Type | Description |
|---|---|---|
| (type definitions) | enums/structs | Type vocabulary for the GEMM subsystem |
Usage Examples
MatrixLayout desc{DataType::kHalf, kColMajor, M, K, lda, pack, 1, nullptr, nullptr};
Operation op{DispatchPolicy::kDefault, Epilogue::kNone, {QuantType::kK, 128}, {}, 0};
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment