Implementation:InternLM Lmdeploy Gemm Kernel
| Knowledge Sources | |
|---|---|
| Domains | GPU_Kernels, GEMM |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Abstract base class for all GEMM kernel implementations, defining the virtual interface for launching, feasibility checking, and querying kernel properties.
Description
The Kernel class is the polymorphic base for the GEMM dispatch system. It holds a KernelDesc (describing the kernel's capabilities) and a KernelInfo (runtime attributes like shared memory size and occupancy). Subclasses (KernelImpl, KernelImplSm90) implement the pure virtual methods.
Key virtual methods:
Launch: Executes the GEMM kernel with given operands, workspace, and streamis_feasible: Checks if this kernel can handle a described GEMM problem (type, order, alignment match)GetMaxSwizzle: Returns the maximum swizzle factor for a given problem shapeGetMaxSplits: Returns the maximum split-K factor given workspace constraints
Helper functions:
Cluster: GroupsLaunchSpecs by kernel properties for hierarchical tuningtranspose: Creates a transposed wrapper kernel
Usage
All GEMM kernels in the registry inherit from this class. The dispatch system uses the base class interface to filter, select, and launch kernels.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/kernels/gemm/kernel.h
Signature
class Kernel {
public:
virtual int Launch(const Operation&, float alpha,
const void* A, const MatrixLayout& Adesc,
const void* U, const MatrixLayout& Udesc,
const void* B, const MatrixLayout& Bdesc,
const void* V, const MatrixLayout& Vdesc,
float beta, const void* C, const MatrixLayout& Cdesc,
void* D, const MatrixLayout& Ddesc,
int swizzle, int splits,
Workspace& workspace, cudaStream_t stream) = 0;
virtual bool is_feasible(const GemmDesc& desc) const noexcept;
virtual int GetMaxSwizzle(const int4& shape) const = 0;
virtual int GetMaxSplits(const int4& shape, int swizzle, size_t bsize, size_t psize) const = 0;
const KernelDesc& desc() const noexcept;
const KernelInfo& info() const noexcept;
};
Import
#include "src/turbomind/kernels/gemm/kernel.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| A, B | const void* | Yes | Input matrix operand pointers |
| U, V | const void* | No | Quantization scale operand pointers |
| alpha, beta | float | Yes | Scaling factors for D = alpha*A@B + beta*C |
| workspace | Workspace& | Yes | Barriers, partials, tensormaps buffers |
Outputs
| Name | Type | Description |
|---|---|---|
| D | void* | Output matrix |
| return | int | 0 on success |
Usage Examples
if (kernel->is_feasible(gemm_desc)) {
kernel->Launch(op, alpha, A, Adesc, U, Udesc, B, Bdesc, V, Vdesc,
beta, C, Cdesc, D, Ddesc, swizzle, splits, workspace, stream);
}