Implementation:InternLM Lmdeploy Gemm DispatchCache

Knowledge Sources	InternLM_Lmdeploy
Domains	GPU_Kernels, GEMM
Last Updated	2026-02-07 15:00 GMT

Overview

Implements a cache that maps GEMM problem descriptors (GemmDesc) to previously-tuned kernel launch specifications (LaunchSpec), with serialization support for persisting tuning results.

Description

The DispatchCache uses a PIMPL (pointer-to-implementation) pattern to store a sorted mapping from GemmDesc to LaunchSpec:

Find: Exact match lookup for a given GemmDesc
LowerBound: Finds the closest match with dimensions less than or equal to the query (used for interpolation when an exact size hasn't been tuned)
Insert: Adds a new tuning result to the cache
Export: Serializes the cache to an output stream for persistence
Import: Deserializes from an input stream, resolving kernel pointers from the provided kernel list

The cache is constructed with the available kernel list to enable pointer resolution during import.

Usage

Used by the Gemm class to avoid re-measuring kernels for previously-seen problem sizes. Tuning results persist across sessions via Export/Import.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: src/turbomind/kernels/gemm/dispatch_cache.h

Signature

class DispatchCache {
public:
    DispatchCache(std::vector<Kernel*> kernels);
    ~DispatchCache();

    std::optional<LaunchSpec> LowerBound(const GemmDesc& desc) const;
    std::optional<LaunchSpec> Find(const GemmDesc& desc) const;
    bool Insert(const GemmDesc& desc, const LaunchSpec& spec);
    int Export(std::ostream& os) const;
    int Import(std::istream& is);
};

Import

#include "src/turbomind/kernels/gemm/dispatch_cache.h"

I/O Contract

Inputs

Name	Type	Required	Description
kernels	vector<Kernel*>	Yes	Available kernels for pointer resolution
desc	GemmDesc	Yes	Problem descriptor to look up or insert
spec	LaunchSpec	For Insert	Tuned launch configuration

Outputs

Name	Type	Description
LaunchSpec	optional	Cached launch specification (if found)
Export/Import	int	Number of entries serialized/deserialized

Usage Examples

DispatchCache cache(kernels);
cache.Import(file_stream);  // Load previous tuning results
if (auto spec = cache.Find(desc)) {
    // Use cached kernel
} else {
    // Tune and insert
    cache.Insert(desc, measured_spec);
}
cache.Export(file_stream);  // Persist results

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment