Principle:Pyro ppl Pyro MCMC Posterior Sampling

Metadata

Field	Value
Page Type	Principle
Knowledge Sources	Repo (Pyro)
Domains	MCMC, Bayesian_Inference
Last Updated	2026-02-09 12:00 GMT

Overview

Running Markov chain Monte Carlo (MCMC) chains to approximate the posterior distribution by constructing a Markov chain whose stationary distribution is the target posterior, then collecting samples to estimate posterior expectations via ergodic averages.

Description

MCMC posterior sampling is the process of generating a sequence of correlated samples from a posterior distribution p(theta | data) by constructing and running a Markov chain with the correct stationary distribution. In Pyro, the MCMC sampler orchestrates the entire sampling pipeline: initialization, warmup/adaptation, sampling, and post-processing.

Warmup Phase

The warmup (or burn-in) phase serves two purposes:

Convergence to the stationary distribution: Initial samples may be drawn from regions of low posterior density, depending on the initialization. The warmup period allows the chain to move from the initialization point to the typical set of the posterior.
Kernel adaptation: During warmup, the MCMC kernel adapts its internal parameters to improve sampling efficiency. For HMC and NUTS, this includes:
- Step size adaptation: Using dual averaging to find a step size that achieves the target acceptance probability.
- Mass matrix adaptation: Estimating the posterior covariance (or diagonal variance) to precondition the Hamiltonian dynamics.

Warmup samples are discarded and not included in the final posterior approximation. By default, Pyro sets the number of warmup steps equal to the number of requested samples if not explicitly specified.

Sampling Phase

After warmup, the kernel parameters are frozen and the chain generates the requested number of samples. These samples are stored and used to approximate posterior quantities:

Posterior mean: E[theta | data] approx (1/N) * sum(theta_i)
Posterior variance: Var[theta | data] approx (1/N) * sum((theta_i - theta_bar)^2)
Credible intervals: Computed from the empirical quantiles of the samples.
Posterior predictive: Generated by running the model forward with each posterior sample.

Multiple Chains

Running multiple independent chains is essential for convergence diagnostics:

Parallel chains: Pyro supports running multiple chains in parallel using Python multiprocessing. Each chain has its own independent random state and initialization.
Sequential chains: When multiprocessing is not available or desired, chains can be run sequentially.
Chain diagnostics: Multiple chains enable the computation of split-Gelman-Rubin R-hat statistics and effective sample size, which assess whether the chains have converged to the same distribution.

Initialization

The starting point of each chain can significantly affect convergence speed. Pyro provides several initialization strategies:

init_to_uniform: Draws initial values from a uniform distribution within the support of each parameter's prior (default).
init_to_median: Initializes to the median of the prior.
init_to_mean: Initializes to the mean of the prior.
init_to_sample: Draws from the prior.
init_to_value: Uses user-specified values.

Usage

MCMC posterior sampling is used when:

Exact posterior is intractable: The posterior does not have a closed-form expression, which is the case for most non-trivial Bayesian models.
Accurate uncertainty quantification is needed: Unlike point estimates (MAP) or approximate methods (variational inference), MCMC provides asymptotically exact samples from the posterior.
Model checking and diagnostics: MCMC samples can be used for posterior predictive checks, model comparison, and sensitivity analysis.
Moderate-dimensional models: MCMC with HMC/NUTS is practical for models with up to hundreds of continuous parameters (and sometimes more, depending on posterior geometry).

Theoretical Basis

Ergodic Theorem

For a Markov chain that is irreducible and aperiodic with stationary distribution pi, the ergodic theorem guarantees:

(1/N) * sum(f(theta_i)) -> E_pi[f(theta)] as N -> infinity

This means that time averages of the chain converge to the corresponding expectations under the target distribution, regardless of the starting point.

Convergence Diagnostics

Because MCMC generates correlated samples, the effective sample size (ESS) is typically smaller than the total number of samples. The ESS measures how many independent samples the chain is equivalent to, accounting for autocorrelation:

ESS = N / (1 + 2 * sum(rho_k))

where rho_k is the autocorrelation at lag k. The split-Gelman-Rubin R-hat statistic compares within-chain and between-chain variance to assess convergence.

Central Limit Theorem for MCMC

Under regularity conditions, the MCMC estimator satisfies a central limit theorem:

sqrt(N) * (theta_bar - E[theta | data]) -> N(0, sigma^2 / ESS)

where sigma^2 is the asymptotic variance accounting for autocorrelation.

Related Pages

Implementation:Pyro_ppl_Pyro_MCMC_Sampler

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment