Heuristic:Rapidsai Cuml Float64 Kernel Stability

Knowledge Sources	cuML cuML kernel stability observations
Domains	Optimization, Numerical_Computing
Last Updated	2026-02-08 00:00 GMT

Overview

Kernel matrices and regression metrics must use float64 accumulators to avoid serious numerical instability with float32 inputs.

Description

When computing pairwise kernel matrices or regression metrics in cuML, float32 precision leads to significant numerical errors. This is because distance and kernel computations involve sums of many small values, and the limited mantissa of float32 causes catastrophic cancellation and accumulation errors. The cuML codebase forces float64 for kernel matrix output regardless of input dtype, and explicitly warns that regression metrics with float32 inputs may produce incorrect results on large datasets.

Usage

Apply this heuristic whenever computing pairwise kernel matrices (e.g., for SVM, kernel PCA) or evaluating regression metrics (mean squared error, R-squared, etc.) on large datasets. If you observe unexpectedly poor metric values or NaN results, check whether float32 data is causing precision loss and switch to float64.

The Insight (Rule of Thumb)

Action: Always use float64 for kernel matrix output and regression metric accumulators, even when input data is float32.
Value: Kernel matrices are forced to dtype=np.float64. For incremental statistics, use _safe_accumulator_op() which promotes to float64.
Trade-off: Doubles memory usage for kernel matrices and intermediate accumulators. On GPUs with limited float64 throughput, this may reduce performance.
When to override: Only if you are certain your data range is small enough that float32 precision is sufficient, and you are constrained by GPU memory.

Reasoning

Deep precision analysis: float32 has ~7 decimal digits of precision. When computing pairwise distances between vectors with magnitudes in the hundreds, the relative error in individual differences can compound across dimensions. For a kernel matrix of size N x N with D-dimensional features, the accumulated error grows with D. Empirical observations in the cuML codebase confirmed that 32-bit kernel matrices produce "serious numerical stability problems." The incremental PCA implementation similarly uses a _safe_accumulator_op() function that promotes float32 to float64 during accumulation to prevent overflow.

Code Evidence

Forced float64 kernel matrix from python/cuml/cuml/metrics/pairwise_kernels.py:166-168:

# Here we force K to use 64 bit, even if the input is 32 bit
# 32 bit K results in serious numerical stability problems
K = cp.zeros((X.shape[0], Y.shape[0]), dtype=np.float64)

Safe accumulator pattern from python/cuml/cuml/decomposition/incremental_pca.py:535-565:

def _safe_accumulator_op(op, *args, **kwargs):
    """Uses float64 accumulator for floating point operations to prevent
    overflow on smaller dtypes."""
    # Promotes to float64 during computation, then casts back

Regression metric warning from python/cuml/cuml/metrics/regression.py:187-189:

"Be careful when using this metric with float32 inputs as the result
can be slightly incorrect because of floating point precision if the
input is large enough. float64 will have lower numerical error."

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment