Implementation:InternLM Lmdeploy Gemm Measurer
Appearance
| Knowledge Sources | |
|---|---|
| Domains | GPU_Kernels, GEMM |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Provides the Measurer class for benchmarking GEMM kernel performance using CUDA events, with configurable stopping criteria for statistical significance.
Description
The Measurer class instruments kernel execution for autotuning:
- Construction: Takes a
StoppingCriterion(e.g., confidence interval or iteration count) to control when measurement is statistically sufficient Measure: Takes a vector ofLaunchSpecs and aLauncherfunction, measures each kernel's execution time, and returnsMeasurementresults (status, sample count, mean, variance)MeasureOne: Measures a single kernel launch with warmup and repeated samplingColdRun: Performs an initial cold-cache run to detect errors before measurement
The Measurement struct captures:
status: CUDA error codesample_count: Number of measurement samples takenmean: Average execution timevariance: Timing variance
The Launcher type is std::function<int(LaunchSpec, cudaStream_t)>, typically bound to Kernel::Launch.
Usage
Used by the GEMM tuner to benchmark candidate kernels and select the fastest for each problem size.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/kernels/gemm/tuner/measurer.h
Signature
struct Measurement {
cudaError_t status;
int sample_count;
float mean;
float variance;
};
using Launcher = std::function<int(LaunchSpec, cudaStream_t)>;
class Measurer {
public:
Measurer(std::unique_ptr<StoppingCriterion> stop_criterion);
~Measurer();
std::vector<Measurement> Measure(const std::vector<LaunchSpec>& specs,
const Launcher& launcher,
cudaStream_t stream);
};
Import
#include "src/turbomind/kernels/gemm/tuner/measurer.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| specs | vector<LaunchSpec> | Yes | Kernel launch configurations to benchmark |
| launcher | Launcher | Yes | Function that executes a kernel given a LaunchSpec |
| stream | cudaStream_t | Yes | CUDA stream for measurement |
Outputs
| Name | Type | Description |
|---|---|---|
| measurements | vector<Measurement> | Per-spec timing results (mean, variance, sample count) |
Usage Examples
auto measurer = Measurer(std::make_unique<ConfidenceInterval>(0.95, 0.01));
auto results = measurer.Measure(candidate_specs, launcher, stream);
// Find fastest kernel
auto best = std::min_element(results.begin(), results.end(),
[](auto& a, auto& b) { return a.mean < b.mean; });
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment