Implementation:InternLM Lmdeploy Gemm Types

Knowledge Sources	InternLM_Lmdeploy
Domains	GPU_Kernels, GEMM
Last Updated	2026-02-07 15:00 GMT

Overview

Core type definitions for the GEMM subsystem, including matrix order, MMA instruction tags, operand packing, striding modes, quantization descriptors, epilogue types, dispatch policies, and matrix layout structures.

Description

This header defines the foundational vocabulary types used throughout the GEMM framework:

Order: kColMajor / kRowMajor with complement operator ~
MMA_Tag: Encoded instruction classes -- HMMA_16816 (SM80+), HMMA_1688 (SM75), HMMA_884 (SM70), HMMA_SIMT (SM75-)
Op_Tag: Operand identifiers -- OPERAND_A through OPERAND_D
Pack: A uint32_t encoding MMA tag, operand tag, and pack number (extracted via get_mma_tag, get_operand_tag, get_pack_num)
Striding: Memory access modes -- kFlat (uniform), kRagged (variable lengths), kIndexed (indirect), kBlocked (contiguous per batch)
QuantType: Quantization axis -- kNone, kK, kM, kB
QuantDesc: Quantization type + group size pair
Epilogue: Post-MMA operations -- kNone, kChannelCombination, kGatedSilu
DispatchPolicy: kDefault, kMeasure, kReuse, kAppend
MatrixLayout: Full matrix description (DataType, Order, rows, cols, ld, pack, num, offsets, idxs)
Workspace: Barriers, partials, tensormaps, and flags buffers
Tape: Dynamic scheduler metadata (CTA counts, shapes, offsets, ranges, tile IDs)
Operation: Dispatch/epilogue/quantization configuration bundle

Usage

Included by virtually every file in the GEMM subsystem as the foundational type vocabulary.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: src/turbomind/kernels/gemm/types.h

Signature

enum class Order : int { kColMajor = 0, kRowMajor = 1 };
enum class Striding : int { kFlat, kRagged, kIndexed, kBlocked };
enum class QuantType : int { kNone, kK, kM, kB };
enum class Epilogue : int { kNone, kChannelCombination, kGatedSilu };
enum class DispatchPolicy : int { kDefault, kMeasure, kReuse, kAppend };

struct MatrixLayout { DataType type; Order order; int rows, cols, ld; Pack pack; int num; int* offsets; int* idxs; };
struct Workspace { void* barriers; size_t barriers_size; void* partials; size_t partials_size; void* tensormaps; size_t tensormaps_size; int* flags; };
struct Tape { int ctas; int max_num; int max_ctas; char* buffer; int4* gemm_shapes; int4* tiled_shapes; int4* tile_offsets; int2* iter_k_ranges; int* tile_ids; };
struct Operation { DispatchPolicy dispatch; Epilogue epilogue; QuantDesc quant_a; QuantDesc quant_b; int batch_dim; };

Import

#include "src/turbomind/kernels/gemm/types.h"

I/O Contract

Inputs

Name	Type	Required	Description
(type definitions)	enums/structs	N/A	Foundational types, no runtime inputs

Outputs

Name	Type	Description
(type definitions)	enums/structs	Type vocabulary for the GEMM subsystem

Usage Examples

MatrixLayout desc{DataType::kHalf, kColMajor, M, K, lda, pack, 1, nullptr, nullptr};
Operation op{DispatchPolicy::kDefault, Epilogue::kNone, {QuantType::kK, 128}, {}, 0};

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment