Principle:Pyro ppl Pyro Statistical Diagnostics

Knowledge Sources	Rank-Normalization, Folding, and Localization: An Improved R-hat Practical Bayesian Model Evaluation Monte Carlo Standard Errors
Domains	MCMC Diagnostics, Statistical Computing, Bayesian Inference
Last Updated	2026-02-09 09:00 GMT

Overview

Statistical diagnostic functions assess the quality and reliability of Monte Carlo samples by measuring convergence, mixing efficiency, and autocorrelation of Markov chains.

Description

After running MCMC or other sampling-based inference, it is critical to assess whether the samples are reliable before drawing conclusions. Statistical diagnostics provide quantitative measures of sample quality.

R-hat (potential scale reduction factor): The most widely used convergence diagnostic. It compares the variance between multiple independent chains to the variance within each chain. If all chains have converged to the same stationary distribution, these should be approximately equal (R-hat close to 1). Values substantially above 1 indicate that the chains have not mixed well and more sampling is needed.

Effective Sample Size (ESS): MCMC samples are autocorrelated, so N MCMC draws contain less information than N independent draws. The ESS estimates the number of effectively independent samples, accounting for autocorrelation. A low ESS relative to the total number of samples indicates high autocorrelation and inefficient sampling.

Autocorrelation function: Measures how correlated a chain is with lagged versions of itself. Rapid decay of autocorrelation indicates good mixing. Slow decay indicates the chain is taking small steps and exploring slowly.

Split R-hat: An improvement that splits each chain in half before computing R-hat, helping detect non-stationarity within a chain (e.g., if a chain has not yet burned in).

These diagnostics should always be checked before using MCMC results for inference. No single diagnostic is sufficient -- they should be used in combination along with visual inspection of trace plots.

Usage

Use statistical diagnostics when:

Assessing convergence of MCMC chains before using posterior samples.
Determining whether more MCMC iterations are needed.
Comparing the efficiency of different MCMC algorithms or tuning parameters.
Reporting the reliability of Bayesian inference results in publications.
Automatically monitoring convergence during adaptive MCMC runs.

Theoretical Basis

R-hat (split R-hat):

# Given M chains, each of length N (after splitting each chain in half):
# Let theta_m,n be the n-th sample from chain m

# Between-chain variance:
# B = N/(M-1) * sum_m (theta_bar_m - theta_bar)^2
# where theta_bar_m = mean of chain m, theta_bar = grand mean

# Within-chain variance:
# W = (1/M) * sum_m s_m^2
# where s_m^2 = (1/(N-1)) * sum_n (theta_m,n - theta_bar_m)^2

# Posterior variance estimate:
# var_hat = (N-1)/N * W + (1/N) * B

# R-hat:
# R_hat = sqrt(var_hat / W)

# Convergence criterion: R_hat < 1.01 (strict) or R_hat < 1.1 (lenient)

Effective Sample Size (ESS):

# ESS measures the equivalent number of independent samples:
# ESS = M * N / (1 + 2 * sum_k rho_k)

# where rho_k is the autocorrelation at lag k
# The sum is truncated when rho_k becomes negligible

# Alternatively, using the between/within chain variances:
# ESS = M * N * var_hat / B  (when B > W, suggesting non-convergence)

# Rule of thumb: ESS > 400 per chain for reliable posterior summaries
# ESS per second is a good measure of sampler efficiency

Autocorrelation function:

# For a chain theta_1, ..., theta_N:
# Autocorrelation at lag k:
# rho_k = (1/(N-k)) * sum_{n=1}^{N-k} (theta_n - theta_bar)(theta_{n+k} - theta_bar) / var(theta)

# Properties:
# rho_0 = 1 (always)
# |rho_k| <= 1
# For good mixing: rho_k decays rapidly to 0
# Integrated autocorrelation time: tau = 1 + 2*sum_{k=1}^inf rho_k
# ESS = N / tau

Practical workflow:

# 1. Run M >= 4 chains from dispersed starting points
# 2. Discard first half as warmup (burn-in)
# 3. Compute split R-hat for all parameters
#    - If any R_hat > 1.01: run longer or diagnose issues
# 4. Compute ESS for all parameters
#    - If ESS < 400: run longer or improve sampler
# 5. Examine trace plots and autocorrelation plots visually
# 6. Only proceed with inference if diagnostics pass

Related Pages

Implementation:Pyro_ppl_Pyro_Stats

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment