Principle:Online ml River Online Covariance Estimation
| Knowledge Sources | Mathematical Statistics Algorithms for Computing the Sample Variance |
|---|---|
| Domains | Online_Learning Statistics Feature_Engineering |
| Last Updated | 2026-02-08 18:00 GMT |
Overview
Online (streaming) covariance estimation computes the covariance matrix of a multivariate data stream incrementally, processing one observation at a time without storing all past data. This provides a foundational statistical summary that many downstream algorithms depend on, including principal component analysis, Mahalanobis distance, and Gaussian models.
Description
The covariance matrix captures the pairwise linear relationships between features. In the batch setting, computing covariance requires a pass through the entire dataset. In the streaming setting, incremental algorithms maintain running statistics that are updated with each new observation in constant time per feature pair.
Key aspects of online covariance estimation:
- Incremental update: Each new observation updates the running means and co-moment matrix using Welford-style algorithms.
- Numerical stability: Naive computation of covariance (summing products) is numerically unstable. Welford's algorithm and its generalizations avoid catastrophic cancellation by tracking centered co-moments.
- Memory efficiency: Only memory is needed for features, regardless of the number of observations.
- Unbiased estimation: The sample covariance divides by (Bessel's correction) for unbiased estimation of the population covariance.
Usage
Use online covariance estimation when:
- You need to track feature correlations in a data stream.
- You are building Gaussian-based models (e.g., Gaussian anomaly scoring, LDA) incrementally.
- You need the Mahalanobis distance for anomaly detection or outlier scoring.
- You want to perform incremental PCA or whitening.
Theoretical Basis
Batch covariance: For observations of features, the sample covariance between features and is:
Cov(j, k) = (1 / (n-1)) * sum_{i=1}^{n} (x_ij - mean_j)(x_ik - mean_k)
Welford's online algorithm (univariate, extended to covariance):
Maintain running statistics , (means), and (co-moment):
function update(x_new):
n += 1
for each feature j:
delta_j = x_new[j] - mean_j
mean_j += delta_j / n
for each feature k:
delta_k = x_new[k] - mean_k (using NEW mean for k if k < j, OLD mean otherwise)
C_jk += delta_j * (x_new[k] - mean_k)
Cov(j, k) = C_jk / (n - 1) # Bessel-corrected
Numerical stability: The co-moment formulation avoids computing , which suffers from catastrophic cancellation when the mean is large relative to the variance.
Complexity: Each update requires operations and storage. For high-dimensional data, sparse or diagonal approximations may be used.
Positive semi-definiteness: The empirical covariance matrix is guaranteed to be positive semi-definite, which is important for downstream algorithms that require valid covariance matrices (e.g., Mahalanobis distance requires invertibility, so regularization may be needed when ).