Implementation:InternLM Lmdeploy Gemm Measurer

Knowledge Sources	InternLM_Lmdeploy
Domains	GPU_Kernels, GEMM
Last Updated	2026-02-07 15:00 GMT

Overview

Provides the Measurer class for benchmarking GEMM kernel performance using CUDA events, with configurable stopping criteria for statistical significance.

Description

The Measurer class instruments kernel execution for autotuning:

Construction: Takes a StoppingCriterion (e.g., confidence interval or iteration count) to control when measurement is statistically sufficient
Measure: Takes a vector of LaunchSpecs and a Launcher function, measures each kernel's execution time, and returns Measurement results (status, sample count, mean, variance)
MeasureOne: Measures a single kernel launch with warmup and repeated sampling
ColdRun: Performs an initial cold-cache run to detect errors before measurement

The Measurement struct captures:

status: CUDA error code
sample_count: Number of measurement samples taken
mean: Average execution time
variance: Timing variance

The Launcher type is std::function<int(LaunchSpec, cudaStream_t)>, typically bound to Kernel::Launch.

Usage

Used by the GEMM tuner to benchmark candidate kernels and select the fastest for each problem size.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: src/turbomind/kernels/gemm/tuner/measurer.h

Signature

struct Measurement {
    cudaError_t status;
    int         sample_count;
    float       mean;
    float       variance;
};

using Launcher = std::function<int(LaunchSpec, cudaStream_t)>;

class Measurer {
public:
    Measurer(std::unique_ptr<StoppingCriterion> stop_criterion);
    ~Measurer();

    std::vector<Measurement> Measure(const std::vector<LaunchSpec>& specs,
                                      const Launcher& launcher,
                                      cudaStream_t stream);
};

Import

#include "src/turbomind/kernels/gemm/tuner/measurer.h"

I/O Contract

Inputs

Name	Type	Required	Description
specs	vector<LaunchSpec>	Yes	Kernel launch configurations to benchmark
launcher	Launcher	Yes	Function that executes a kernel given a LaunchSpec
stream	cudaStream_t	Yes	CUDA stream for measurement

Outputs

Name	Type	Description
measurements	vector<Measurement>	Per-spec timing results (mean, variance, sample count)

Usage Examples

auto measurer = Measurer(std::make_unique<ConfidenceInterval>(0.95, 0.01));
auto results = measurer.Measure(candidate_specs, launcher, stream);
// Find fastest kernel
auto best = std::min_element(results.begin(), results.end(),
    [](auto& a, auto& b) { return a.mean < b.mean; });

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment