Principle:Pyro ppl Pyro Statistical Diagnostics
| Knowledge Sources | |
|---|---|
| Domains | MCMC Diagnostics, Statistical Computing, Bayesian Inference |
| Last Updated | 2026-02-09 09:00 GMT |
Overview
Statistical diagnostic functions assess the quality and reliability of Monte Carlo samples by measuring convergence, mixing efficiency, and autocorrelation of Markov chains.
Description
After running MCMC or other sampling-based inference, it is critical to assess whether the samples are reliable before drawing conclusions. Statistical diagnostics provide quantitative measures of sample quality.
R-hat (potential scale reduction factor): The most widely used convergence diagnostic. It compares the variance between multiple independent chains to the variance within each chain. If all chains have converged to the same stationary distribution, these should be approximately equal (R-hat close to 1). Values substantially above 1 indicate that the chains have not mixed well and more sampling is needed.
Effective Sample Size (ESS): MCMC samples are autocorrelated, so N MCMC draws contain less information than N independent draws. The ESS estimates the number of effectively independent samples, accounting for autocorrelation. A low ESS relative to the total number of samples indicates high autocorrelation and inefficient sampling.
Autocorrelation function: Measures how correlated a chain is with lagged versions of itself. Rapid decay of autocorrelation indicates good mixing. Slow decay indicates the chain is taking small steps and exploring slowly.
Split R-hat: An improvement that splits each chain in half before computing R-hat, helping detect non-stationarity within a chain (e.g., if a chain has not yet burned in).
These diagnostics should always be checked before using MCMC results for inference. No single diagnostic is sufficient -- they should be used in combination along with visual inspection of trace plots.
Usage
Use statistical diagnostics when:
- Assessing convergence of MCMC chains before using posterior samples.
- Determining whether more MCMC iterations are needed.
- Comparing the efficiency of different MCMC algorithms or tuning parameters.
- Reporting the reliability of Bayesian inference results in publications.
- Automatically monitoring convergence during adaptive MCMC runs.
Theoretical Basis
R-hat (split R-hat):
# Given M chains, each of length N (after splitting each chain in half):
# Let theta_m,n be the n-th sample from chain m
# Between-chain variance:
# B = N/(M-1) * sum_m (theta_bar_m - theta_bar)^2
# where theta_bar_m = mean of chain m, theta_bar = grand mean
# Within-chain variance:
# W = (1/M) * sum_m s_m^2
# where s_m^2 = (1/(N-1)) * sum_n (theta_m,n - theta_bar_m)^2
# Posterior variance estimate:
# var_hat = (N-1)/N * W + (1/N) * B
# R-hat:
# R_hat = sqrt(var_hat / W)
# Convergence criterion: R_hat < 1.01 (strict) or R_hat < 1.1 (lenient)
Effective Sample Size (ESS):
# ESS measures the equivalent number of independent samples:
# ESS = M * N / (1 + 2 * sum_k rho_k)
# where rho_k is the autocorrelation at lag k
# The sum is truncated when rho_k becomes negligible
# Alternatively, using the between/within chain variances:
# ESS = M * N * var_hat / B (when B > W, suggesting non-convergence)
# Rule of thumb: ESS > 400 per chain for reliable posterior summaries
# ESS per second is a good measure of sampler efficiency
Autocorrelation function:
# For a chain theta_1, ..., theta_N:
# Autocorrelation at lag k:
# rho_k = (1/(N-k)) * sum_{n=1}^{N-k} (theta_n - theta_bar)(theta_{n+k} - theta_bar) / var(theta)
# Properties:
# rho_0 = 1 (always)
# |rho_k| <= 1
# For good mixing: rho_k decays rapidly to 0
# Integrated autocorrelation time: tau = 1 + 2*sum_{k=1}^inf rho_k
# ESS = N / tau
Practical workflow:
# 1. Run M >= 4 chains from dispersed starting points
# 2. Discard first half as warmup (burn-in)
# 3. Compute split R-hat for all parameters
# - If any R_hat > 1.01: run longer or diagnose issues
# 4. Compute ESS for all parameters
# - If ESS < 400: run longer or improve sampler
# 5. Examine trace plots and autocorrelation plots visually
# 6. Only proceed with inference if diagnostics pass