Principle:Pyro ppl Pyro MCMC Numerical Methods
| Knowledge Sources | |
|---|---|
| Domains | MCMC, Numerical Methods, Hamiltonian Dynamics |
| Last Updated | 2026-02-09 09:00 GMT |
Overview
Numerical methods supporting MCMC include symplectic integrators for Hamiltonian dynamics, adaptive step-size tuning via dual averaging, and Newton-step proposals for efficient exploration of the posterior.
Description
Hamiltonian Monte Carlo (HMC) and its variants are among the most effective MCMC algorithms for continuous posterior distributions. They require several numerical building blocks:
Velocity Verlet (leapfrog) integrator: The core numerical method for simulating Hamiltonian dynamics. Given a position q (model parameters) and momentum p (auxiliary variables), the leapfrog integrator approximates the continuous Hamiltonian trajectory using discrete steps. The key property is symplecticity: the integrator exactly preserves a modified Hamiltonian, ensuring that the volume-preserving property needed for correct MCMC is maintained. The leapfrog integrator is second-order accurate, time-reversible, and volume-preserving -- three properties essential for HMC correctness.
Dual averaging for step-size adaptation: HMC's performance is sensitive to the step size epsilon. Too large causes high rejection rates; too small leads to slow exploration. Dual averaging is a stochastic optimization algorithm that automatically tunes epsilon during warmup to achieve a target acceptance probability (typically 0.65-0.80). It uses a Robbins-Monro type averaging scheme that is robust and has theoretical convergence guarantees.
Newton step proposals: For certain posterior geometries, a single Newton step (using the Hessian of the log-posterior) can make a highly effective MCMC proposal. The Newton step moves directly toward a local mode of the posterior, making it useful for:
- Initialization (finding good starting points for MCMC).
- Proposals in Metropolis-Hastings when the posterior is approximately quadratic.
- Laplace approximations (approximating the posterior with a Gaussian centered at the mode).
Usage
Use these numerical methods when:
- Implementing or customizing HMC/NUTS samplers (leapfrog integrator).
- Tuning MCMC step sizes automatically during warmup (dual averaging).
- Finding posterior modes for initialization or Laplace approximation (Newton step).
- Building custom MCMC kernels that combine gradient information with Hamiltonian dynamics.
Theoretical Basis
Velocity Verlet (leapfrog) integrator:
# Hamiltonian: H(q, p) = U(q) + K(p)
# where U(q) = -log p(q|data) (potential energy = negative log-posterior)
# K(p) = 0.5 * p^T M^{-1} p (kinetic energy)
# Hamilton's equations:
# dq/dt = grad_p K(p) = M^{-1} p
# dp/dt = -grad_q U(q) = grad_q log p(q|data)
# Leapfrog integrator (one step of size epsilon):
# 1. Half-step momentum: p_{1/2} = p_0 - (epsilon/2) * grad_q U(q_0)
# 2. Full-step position: q_1 = q_0 + epsilon * M^{-1} * p_{1/2}
# 3. Half-step momentum: p_1 = p_{1/2} - (epsilon/2) * grad_q U(q_1)
# Properties:
# - Symplectic: preserves the symplectic 2-form (volume-preserving)
# - Time-reversible: (q_0, p_0) -> (q_1, p_1) -> (q_0, p_0) exactly
# - Second-order accurate: error = O(epsilon^3) per step
# - Energy error = O(epsilon^2) (bounded, does not grow with trajectory length)
Dual averaging for step-size adaptation:
# Goal: find epsilon such that acceptance probability = target_accept (e.g., 0.8)
# Initialize:
# mu = log(10 * epsilon_0) (free parameter)
# log_epsilon_bar = 0
# H_bar = 0
# At iteration m:
# Run HMC with current epsilon, observe acceptance probability alpha_m
# Update:
# H_bar = (1 - 1/(m + t_0)) * H_bar + (1/(m + t_0)) * (target_accept - alpha_m)
# log_epsilon = mu - sqrt(m) / gamma * H_bar
# log_epsilon_bar = m^{-kappa} * log_epsilon + (1 - m^{-kappa}) * log_epsilon_bar
# During warmup: use epsilon = exp(log_epsilon)
# After warmup: fix epsilon = exp(log_epsilon_bar)
# Parameters: t_0 = 10, gamma = 0.05, kappa = 0.75
Newton step:
# Newton's method for finding the mode of log p(q|data):
# q_{k+1} = q_k - H(q_k)^{-1} * g(q_k)
# where:
# g(q) = grad_q log p(q|data) (gradient / score)
# H(q) = grad_q^2 log p(q|data) (Hessian)
# For MCMC proposal:
# Propose q* = q - H(q)^{-1} * g(q) + noise
# The noise term ensures the proposal explores rather than just converging
# Laplace approximation at mode q*:
# p(q|data) approx Normal(q*, -H(q*)^{-1})
# where H(q*) is the Hessian at the mode (negative definite)
# Computational cost: O(d^3) for d-dimensional parameter space
# (Hessian inversion). Can use quasi-Newton (L-BFGS) for O(d) cost.