Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Online ml River Online Covariance Estimation

From Leeroopedia


Knowledge Sources Mathematical Statistics Algorithms for Computing the Sample Variance
Domains Online_Learning Statistics Feature_Engineering
Last Updated 2026-02-08 18:00 GMT

Overview

Online (streaming) covariance estimation computes the covariance matrix of a multivariate data stream incrementally, processing one observation at a time without storing all past data. This provides a foundational statistical summary that many downstream algorithms depend on, including principal component analysis, Mahalanobis distance, and Gaussian models.

Description

The covariance matrix captures the pairwise linear relationships between features. In the batch setting, computing covariance requires a pass through the entire dataset. In the streaming setting, incremental algorithms maintain running statistics that are updated with each new observation in constant time per feature pair.

Key aspects of online covariance estimation:

  • Incremental update: Each new observation updates the running means and co-moment matrix using Welford-style algorithms.
  • Numerical stability: Naive computation of covariance (summing products) is numerically unstable. Welford's algorithm and its generalizations avoid catastrophic cancellation by tracking centered co-moments.
  • Memory efficiency: Only O(d2) memory is needed for d features, regardless of the number of observations.
  • Unbiased estimation: The sample covariance divides by n1 (Bessel's correction) for unbiased estimation of the population covariance.

Usage

Use online covariance estimation when:

  • You need to track feature correlations in a data stream.
  • You are building Gaussian-based models (e.g., Gaussian anomaly scoring, LDA) incrementally.
  • You need the Mahalanobis distance for anomaly detection or outlier scoring.
  • You want to perform incremental PCA or whitening.

Theoretical Basis

Batch covariance: For n observations of d features, the sample covariance between features j and k is:

Cov(j, k) = (1 / (n-1)) * sum_{i=1}^{n} (x_ij - mean_j)(x_ik - mean_k)

Welford's online algorithm (univariate, extended to covariance):

Maintain running statistics n, x¯j (means), and Cjk (co-moment):

function update(x_new):
    n += 1
    for each feature j:
        delta_j = x_new[j] - mean_j
        mean_j += delta_j / n
        for each feature k:
            delta_k = x_new[k] - mean_k  (using NEW mean for k if k < j, OLD mean otherwise)
            C_jk += delta_j * (x_new[k] - mean_k)

    Cov(j, k) = C_jk / (n - 1)    # Bessel-corrected

Numerical stability: The co-moment formulation avoids computing xjxknx¯jx¯k, which suffers from catastrophic cancellation when the mean is large relative to the variance.

Complexity: Each update requires O(d2) operations and O(d2) storage. For high-dimensional data, sparse or diagonal approximations may be used.

Positive semi-definiteness: The empirical covariance matrix is guaranteed to be positive semi-definite, which is important for downstream algorithms that require valid covariance matrices (e.g., Mahalanobis distance requires invertibility, so regularization may be needed when n<d).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment