Principle:Pyro ppl Pyro MCMC Numerical Methods

Knowledge Sources	MCMC Using Hamiltonian Dynamics The No-U-Turn Sampler Dual Averaging for Stochastic Optimization
Domains	MCMC, Numerical Methods, Hamiltonian Dynamics
Last Updated	2026-02-09 09:00 GMT

Overview

Numerical methods supporting MCMC include symplectic integrators for Hamiltonian dynamics, adaptive step-size tuning via dual averaging, and Newton-step proposals for efficient exploration of the posterior.

Description

Hamiltonian Monte Carlo (HMC) and its variants are among the most effective MCMC algorithms for continuous posterior distributions. They require several numerical building blocks:

Velocity Verlet (leapfrog) integrator: The core numerical method for simulating Hamiltonian dynamics. Given a position q (model parameters) and momentum p (auxiliary variables), the leapfrog integrator approximates the continuous Hamiltonian trajectory using discrete steps. The key property is symplecticity: the integrator exactly preserves a modified Hamiltonian, ensuring that the volume-preserving property needed for correct MCMC is maintained. The leapfrog integrator is second-order accurate, time-reversible, and volume-preserving -- three properties essential for HMC correctness.

Dual averaging for step-size adaptation: HMC's performance is sensitive to the step size epsilon. Too large causes high rejection rates; too small leads to slow exploration. Dual averaging is a stochastic optimization algorithm that automatically tunes epsilon during warmup to achieve a target acceptance probability (typically 0.65-0.80). It uses a Robbins-Monro type averaging scheme that is robust and has theoretical convergence guarantees.

Newton step proposals: For certain posterior geometries, a single Newton step (using the Hessian of the log-posterior) can make a highly effective MCMC proposal. The Newton step moves directly toward a local mode of the posterior, making it useful for:

Initialization (finding good starting points for MCMC).
Proposals in Metropolis-Hastings when the posterior is approximately quadratic.
Laplace approximations (approximating the posterior with a Gaussian centered at the mode).

Usage

Use these numerical methods when:

Implementing or customizing HMC/NUTS samplers (leapfrog integrator).
Tuning MCMC step sizes automatically during warmup (dual averaging).
Finding posterior modes for initialization or Laplace approximation (Newton step).
Building custom MCMC kernels that combine gradient information with Hamiltonian dynamics.

Theoretical Basis

Velocity Verlet (leapfrog) integrator:

# Hamiltonian: H(q, p) = U(q) + K(p)
# where U(q) = -log p(q|data)  (potential energy = negative log-posterior)
#       K(p) = 0.5 * p^T M^{-1} p  (kinetic energy)

# Hamilton's equations:
# dq/dt = grad_p K(p) = M^{-1} p
# dp/dt = -grad_q U(q) = grad_q log p(q|data)

# Leapfrog integrator (one step of size epsilon):
# 1. Half-step momentum:  p_{1/2} = p_0 - (epsilon/2) * grad_q U(q_0)
# 2. Full-step position:  q_1 = q_0 + epsilon * M^{-1} * p_{1/2}
# 3. Half-step momentum:  p_1 = p_{1/2} - (epsilon/2) * grad_q U(q_1)

# Properties:
# - Symplectic: preserves the symplectic 2-form (volume-preserving)
# - Time-reversible: (q_0, p_0) -> (q_1, p_1) -> (q_0, p_0) exactly
# - Second-order accurate: error = O(epsilon^3) per step
# - Energy error = O(epsilon^2) (bounded, does not grow with trajectory length)

Dual averaging for step-size adaptation:

# Goal: find epsilon such that acceptance probability = target_accept (e.g., 0.8)

# Initialize:
# mu = log(10 * epsilon_0)  (free parameter)
# log_epsilon_bar = 0
# H_bar = 0

# At iteration m:
# Run HMC with current epsilon, observe acceptance probability alpha_m

# Update:
# H_bar = (1 - 1/(m + t_0)) * H_bar + (1/(m + t_0)) * (target_accept - alpha_m)
# log_epsilon = mu - sqrt(m) / gamma * H_bar
# log_epsilon_bar = m^{-kappa} * log_epsilon + (1 - m^{-kappa}) * log_epsilon_bar

# During warmup: use epsilon = exp(log_epsilon)
# After warmup: fix epsilon = exp(log_epsilon_bar)

# Parameters: t_0 = 10, gamma = 0.05, kappa = 0.75

Newton step:

# Newton's method for finding the mode of log p(q|data):
# q_{k+1} = q_k - H(q_k)^{-1} * g(q_k)

# where:
# g(q) = grad_q log p(q|data)     (gradient / score)
# H(q) = grad_q^2 log p(q|data)   (Hessian)

# For MCMC proposal:
# Propose q* = q - H(q)^{-1} * g(q) + noise
# The noise term ensures the proposal explores rather than just converging

# Laplace approximation at mode q*:
# p(q|data) approx Normal(q*, -H(q*)^{-1})
# where H(q*) is the Hessian at the mode (negative definite)

# Computational cost: O(d^3) for d-dimensional parameter space
# (Hessian inversion). Can use quasi-Newton (L-BFGS) for O(d) cost.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment