Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Gemm Kernel

From Leeroopedia
Revision as of 15:14, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/InternLM_Lmdeploy_Gemm_Kernel.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains GPU_Kernels, GEMM
Last Updated 2026-02-07 15:00 GMT

Overview

Abstract base class for all GEMM kernel implementations, defining the virtual interface for launching, feasibility checking, and querying kernel properties.

Description

The Kernel class is the polymorphic base for the GEMM dispatch system. It holds a KernelDesc (describing the kernel's capabilities) and a KernelInfo (runtime attributes like shared memory size and occupancy). Subclasses (KernelImpl, KernelImplSm90) implement the pure virtual methods.

Key virtual methods:

  • Launch: Executes the GEMM kernel with given operands, workspace, and stream
  • is_feasible: Checks if this kernel can handle a described GEMM problem (type, order, alignment match)
  • GetMaxSwizzle: Returns the maximum swizzle factor for a given problem shape
  • GetMaxSplits: Returns the maximum split-K factor given workspace constraints

Helper functions:

  • Cluster: Groups LaunchSpecs by kernel properties for hierarchical tuning
  • transpose: Creates a transposed wrapper kernel

Usage

All GEMM kernels in the registry inherit from this class. The dispatch system uses the base class interface to filter, select, and launch kernels.

Code Reference

Source Location

Signature

class Kernel {
public:
    virtual int Launch(const Operation&, float alpha,
                       const void* A, const MatrixLayout& Adesc,
                       const void* U, const MatrixLayout& Udesc,
                       const void* B, const MatrixLayout& Bdesc,
                       const void* V, const MatrixLayout& Vdesc,
                       float beta, const void* C, const MatrixLayout& Cdesc,
                       void* D, const MatrixLayout& Ddesc,
                       int swizzle, int splits,
                       Workspace& workspace, cudaStream_t stream) = 0;

    virtual bool is_feasible(const GemmDesc& desc) const noexcept;
    virtual int GetMaxSwizzle(const int4& shape) const = 0;
    virtual int GetMaxSplits(const int4& shape, int swizzle, size_t bsize, size_t psize) const = 0;

    const KernelDesc& desc() const noexcept;
    const KernelInfo& info() const noexcept;
};

Import

#include "src/turbomind/kernels/gemm/kernel.h"

I/O Contract

Inputs

Name Type Required Description
A, B const void* Yes Input matrix operand pointers
U, V const void* No Quantization scale operand pointers
alpha, beta float Yes Scaling factors for D = alpha*A@B + beta*C
workspace Workspace& Yes Barriers, partials, tensormaps buffers

Outputs

Name Type Description
D void* Output matrix
return int 0 on success

Usage Examples

if (kernel->is_feasible(gemm_desc)) {
    kernel->Launch(op, alpha, A, Adesc, U, Udesc, B, Bdesc, V, Vdesc,
                   beta, C, Cdesc, D, Ddesc, swizzle, splits, workspace, stream);
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment