Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Gemm Measurer

From Leeroopedia
Revision as of 15:14, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/InternLM_Lmdeploy_Gemm_Measurer.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains GPU_Kernels, GEMM
Last Updated 2026-02-07 15:00 GMT

Overview

Provides the Measurer class for benchmarking GEMM kernel performance using CUDA events, with configurable stopping criteria for statistical significance.

Description

The Measurer class instruments kernel execution for autotuning:

  • Construction: Takes a StoppingCriterion (e.g., confidence interval or iteration count) to control when measurement is statistically sufficient
  • Measure: Takes a vector of LaunchSpecs and a Launcher function, measures each kernel's execution time, and returns Measurement results (status, sample count, mean, variance)
  • MeasureOne: Measures a single kernel launch with warmup and repeated sampling
  • ColdRun: Performs an initial cold-cache run to detect errors before measurement

The Measurement struct captures:

  • status: CUDA error code
  • sample_count: Number of measurement samples taken
  • mean: Average execution time
  • variance: Timing variance

The Launcher type is std::function<int(LaunchSpec, cudaStream_t)>, typically bound to Kernel::Launch.

Usage

Used by the GEMM tuner to benchmark candidate kernels and select the fastest for each problem size.

Code Reference

Source Location

Signature

struct Measurement {
    cudaError_t status;
    int         sample_count;
    float       mean;
    float       variance;
};

using Launcher = std::function<int(LaunchSpec, cudaStream_t)>;

class Measurer {
public:
    Measurer(std::unique_ptr<StoppingCriterion> stop_criterion);
    ~Measurer();

    std::vector<Measurement> Measure(const std::vector<LaunchSpec>& specs,
                                      const Launcher& launcher,
                                      cudaStream_t stream);
};

Import

#include "src/turbomind/kernels/gemm/tuner/measurer.h"

I/O Contract

Inputs

Name Type Required Description
specs vector<LaunchSpec> Yes Kernel launch configurations to benchmark
launcher Launcher Yes Function that executes a kernel given a LaunchSpec
stream cudaStream_t Yes CUDA stream for measurement

Outputs

Name Type Description
measurements vector<Measurement> Per-spec timing results (mean, variance, sample count)

Usage Examples

auto measurer = Measurer(std::make_unique<ConfidenceInterval>(0.95, 0.01));
auto results = measurer.Measure(candidate_specs, launcher, stream);
// Find fastest kernel
auto best = std::min_element(results.begin(), results.end(),
    [](auto& a, auto& b) { return a.mean < b.mean; });

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment