Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Pyro ppl Pyro Online Statistics

From Leeroopedia


Knowledge Sources
Domains Online Algorithms, Statistical Computing, Streaming Data
Last Updated 2026-02-09 09:00 GMT

Overview

Online (streaming) algorithms compute running statistics such as mean, variance, and covariance incrementally as data arrives, using constant memory and avoiding the need to store all past observations.

Description

In many settings -- particularly during MCMC sampling and online learning -- statistics must be computed incrementally as new data points arrive. Storing all observations to recompute statistics from scratch is impractical due to memory constraints and computational cost.

Welford's algorithm provides a numerically stable online algorithm for computing the running mean and variance. Unlike the naive algorithm (which computes mean and sum-of-squares separately), Welford's algorithm maintains a running corrected sum of squares that avoids catastrophic cancellation.

The algorithm extends naturally to:

Online covariance estimation (Welford covariance): Maintains a running estimate of the full covariance matrix, updated with each new observation in O(d^2) time for d-dimensional data. This is essential for:

  • Adapting the mass matrix in HMC during warmup.
  • Estimating posterior covariance for diagnostics.
  • Building proposal distributions in adaptive MCMC.

Streaming statistics: More general online algorithms that maintain running estimates of various summary statistics:

  • Mean and variance (Welford).
  • Quantiles (approximate, using t-digest or similar).
  • Higher moments (skewness, kurtosis).
  • Exponentially weighted moving averages for non-stationary data.

The key properties of these algorithms are:

  • Constant memory: O(d^2) for d-dimensional covariance, independent of the number of observations.
  • Single pass: Each observation is processed once and discarded.
  • Numerical stability: Avoid the catastrophic cancellation that plagues naive implementations.
  • Exact equivalence: The running statistics are exactly equal to what batch computation would produce (for mean and variance).

Usage

Use online statistics when:

  • Computing running mean and covariance during MCMC warmup for mass matrix adaptation.
  • Monitoring convergence of MCMC chains in real-time.
  • Processing streaming data where storing all observations is impractical.
  • Implementing adaptive algorithms that adjust based on accumulated statistics.

Theoretical Basis

Welford's online mean and variance:

# Initialize:
# n = 0, mean = 0, M2 = 0

# Update with new observation x:
# n = n + 1
# delta = x - mean
# mean = mean + delta / n
# delta2 = x - mean     # note: using updated mean
# M2 = M2 + delta * delta2

# Retrieve:
# variance = M2 / (n - 1)        # sample variance (unbiased)
# population_variance = M2 / n   # population variance

# Why stable: avoids computing sum(x_i^2) - n*mean^2
# which suffers from catastrophic cancellation when mean >> std

Welford's online covariance (multivariate):

# Initialize:
# n = 0, mean = zeros(d), C = zeros(d, d)

# Update with new observation x (d-dimensional vector):
# n = n + 1
# delta = x - mean
# mean = mean + delta / n
# delta2 = x - mean     # using updated mean
# C = C + outer(delta, delta2)

# Retrieve:
# covariance = C / (n - 1)  # sample covariance matrix

# Properties:
# - O(d^2) time per update
# - O(d^2) memory total
# - Numerically stable
# - Exactly equals batch covariance for any n

Combining running statistics (parallel/distributed):

# Given two sets of running statistics:
# Set A: n_a, mean_a, M2_a
# Set B: n_b, mean_b, M2_b

# Combined statistics:
# n = n_a + n_b
# delta = mean_b - mean_a
# mean = mean_a + delta * n_b / n
# M2 = M2_a + M2_b + delta^2 * n_a * n_b / n

# This enables parallel computation and chain combination

Application to MCMC mass matrix adaptation:

# During HMC warmup:
# 1. Run initial samples with identity mass matrix
# 2. Use Welford covariance to estimate posterior covariance C
# 3. Set mass matrix M = C  (or M = diag(C) for diagonal adaptation)
# 4. Continue sampling with adapted mass matrix
# 5. Repeat adaptation in windows of increasing length

# The mass matrix preconditions the Hamiltonian dynamics:
# K(p) = 0.5 * p^T M^{-1} p
# Good M approx posterior covariance => isotropic energy landscape

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment