Implementation:InternLM Lmdeploy Gemm Context
Appearance
| Knowledge Sources | |
|---|---|
| Domains | GPU_Kernels, GEMM |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Manages GEMM problem context including architecture detection, problem descriptor construction, kernel feasibility filtering, launch parameter population, and swizzle factor enumeration.
Description
The Context class serves as the bridge between a GEMM problem description (from matrix layouts and operation metadata) and the kernel dispatch system:
- Construction: Takes
cudaDevicePropto determine architecture and SM count Init: ConvertsOperationandMatrixLayoutdescriptors for all six operands (A, U, B, V, C, D) intoGemmDescand its transposed variantFilter: Takes a list of registered kernels and returns only those feasible for the current problemPopulate: GeneratesLaunchSpecs by computing split-K factors and launch configurations for a given kernelSwizzle: Enumerates valid swizzle factors for a given launch specget_desc: Returns the appropriate (possibly transposed)GemmDescfor a specific kernel
Usage
Created once per device, reinitialized per GEMM problem. Used by the Gemm class to prepare the dispatch context.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/kernels/gemm/context.h
Signature
class Context {
public:
explicit Context(const cudaDeviceProp& prop);
bool Init(const Operation&, const MatrixLayout& Adesc, ..., const MatrixLayout& Ddesc);
std::vector<Kernel*> Filter(const std::vector<Kernel*>& kernels) const;
std::vector<LaunchSpec> Populate(const Kernel& kernel, const PopulateParam& param) const;
std::vector<LaunchSpec> Swizzle(const LaunchSpec& spec, const std::vector<int>& swizzle) const;
const GemmDesc& desc() const;
};
Import
#include "src/turbomind/kernels/gemm/context.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| prop | cudaDeviceProp | Yes | Device properties for architecture detection |
| operation | Operation | Yes | Dispatch/epilogue/quantization configuration |
| Adesc..Ddesc | MatrixLayout | Yes | Layout descriptors for all operands |
Outputs
| Name | Type | Description |
|---|---|---|
| filtered kernels | vector<Kernel*> | Feasible kernels for the current problem |
| launch specs | vector<LaunchSpec> | Launch configurations with split-K and swizzle |
Usage Examples
Context ctx(device_prop);
ctx.Init(operation, Adesc, Udesc, Bdesc, Vdesc, Cdesc, Ddesc);
auto feasible = ctx.Filter(registry.kernels());
auto specs = ctx.Populate(*feasible[0], populate_param);
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment