Implementation:InternLM Lmdeploy Gemm Kernel

Knowledge Sources	InternLM_Lmdeploy
Domains	GPU_Kernels, GEMM
Last Updated	2026-02-07 15:00 GMT

Overview

Abstract base class for all GEMM kernel implementations, defining the virtual interface for launching, feasibility checking, and querying kernel properties.

Description

The Kernel class is the polymorphic base for the GEMM dispatch system. It holds a KernelDesc (describing the kernel's capabilities) and a KernelInfo (runtime attributes like shared memory size and occupancy). Subclasses (KernelImpl, KernelImplSm90) implement the pure virtual methods.

Key virtual methods:

Launch: Executes the GEMM kernel with given operands, workspace, and stream
is_feasible: Checks if this kernel can handle a described GEMM problem (type, order, alignment match)
GetMaxSwizzle: Returns the maximum swizzle factor for a given problem shape
GetMaxSplits: Returns the maximum split-K factor given workspace constraints

Helper functions:

Cluster: Groups LaunchSpecs by kernel properties for hierarchical tuning
transpose: Creates a transposed wrapper kernel

Usage

All GEMM kernels in the registry inherit from this class. The dispatch system uses the base class interface to filter, select, and launch kernels.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: src/turbomind/kernels/gemm/kernel.h

Signature

class Kernel {
public:
    virtual int Launch(const Operation&, float alpha,
                       const void* A, const MatrixLayout& Adesc,
                       const void* U, const MatrixLayout& Udesc,
                       const void* B, const MatrixLayout& Bdesc,
                       const void* V, const MatrixLayout& Vdesc,
                       float beta, const void* C, const MatrixLayout& Cdesc,
                       void* D, const MatrixLayout& Ddesc,
                       int swizzle, int splits,
                       Workspace& workspace, cudaStream_t stream) = 0;

    virtual bool is_feasible(const GemmDesc& desc) const noexcept;
    virtual int GetMaxSwizzle(const int4& shape) const = 0;
    virtual int GetMaxSplits(const int4& shape, int swizzle, size_t bsize, size_t psize) const = 0;

    const KernelDesc& desc() const noexcept;
    const KernelInfo& info() const noexcept;
};

Import

#include "src/turbomind/kernels/gemm/kernel.h"

I/O Contract

Inputs

Name	Type	Required	Description
A, B	const void*	Yes	Input matrix operand pointers
U, V	const void*	No	Quantization scale operand pointers
alpha, beta	float	Yes	Scaling factors for D = alphaA@B + betaC
workspace	Workspace&	Yes	Barriers, partials, tensormaps buffers

Outputs

Name	Type	Description
D	void*	Output matrix
return	int	0 on success

Usage Examples

if (kernel->is_feasible(gemm_desc)) {
    kernel->Launch(op, alpha, A, Adesc, U, Udesc, B, Bdesc, V, Vdesc,
                   beta, C, Cdesc, D, Ddesc, swizzle, splits, workspace, stream);
}

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment