Principle:Pyro ppl Pyro Hamiltonian Monte Carlo
Metadata
| Field | Value |
|---|---|
| Page Type | Principle |
| Knowledge Sources | Paper (MCMC Using Hamiltonian Dynamics), Repo (Pyro) |
| Domains | MCMC, Bayesian_Inference |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo method that uses Hamiltonian dynamics to propose moves in parameter space, dramatically improving sampling efficiency over random walk Metropolis for high-dimensional continuous distributions.
Description
HMC leverages the mathematics of Hamiltonian mechanics to generate distant, low-correlation proposals that are accepted with high probability. Rather than proposing small random perturbations (as in Metropolis-Hastings), HMC simulates the motion of a fictitious physical system where the parameters are treated as the "position" of a particle and auxiliary "momentum" variables drive the particle along the energy surface defined by the posterior.
Augmented Target Distribution
Given a target posterior p(theta | data), HMC introduces auxiliary momentum variables r drawn from a multivariate normal distribution N(0, M), where M is the mass matrix. The joint distribution is:
p(theta, r) proportional to exp(-H(theta, r))
where the Hamiltonian is:
H(theta, r) = U(theta) + K(r) = -log p(theta | data) + 0.5 * r^T M^{-1} r
The potential energy U(theta) = -log p(theta | data) captures the geometry of the posterior, while the kinetic energy K(r) = 0.5 * r^T M^{-1} r drives the dynamics.
Leapfrog Integrator
Hamiltonian dynamics are simulated using the leapfrog integrator, a symplectic (volume-preserving, time-reversible) numerical integration scheme. For a step size epsilon:
- Half-step momentum update:
r <- r - (epsilon/2) * grad U(theta) - Full-step position update:
theta <- theta + epsilon * M^{-1} * r - Half-step momentum update:
r <- r - (epsilon/2) * grad U(theta)
This three-step procedure is repeated num_steps times. The symplectic property ensures that the integrator preserves the Hamiltonian up to discretization error of order O(epsilon^2), enabling high acceptance rates even for large moves.
Metropolis Accept/Reject
After the leapfrog trajectory, a Metropolis accept/reject step corrects for the discretization error of the leapfrog integrator. The proposed state (theta*, r*) is accepted with probability:
min(1, exp(-H(theta*, r*) + H(theta, r)))
Because the leapfrog integrator nearly preserves the Hamiltonian, this acceptance probability is typically close to 1 for well-tuned step sizes.
Key Tuning Parameters
Unlike NUTS, HMC requires manual specification of two critical parameters:
- Step size (
epsilon): Controls the accuracy of the leapfrog integrator. Too large causes energy drift and low acceptance rates; too small leads to slow exploration. Can be adapted during warmup via dual averaging. - Number of steps (
num_steps) or trajectory length (trajectory_length = epsilon * num_steps): Controls how far the trajectory travels. Too short produces correlated samples (random walk behavior); too long wastes computation and can cause U-turns.
Adaptation During Warmup
During the warmup phase, HMC in Pyro can adapt:
- Step size: Using the dual averaging algorithm of Nesterov (2009) to target a specified acceptance probability (default 0.8).
- Mass matrix: Using the Welford online algorithm to estimate the marginal variances (diagonal mass matrix) or full covariance (dense mass matrix) of the posterior.
Usage
HMC is appropriate when:
- Continuous parameters: HMC requires differentiable potential energy, so all sampled parameters must be continuous. Discrete parameters must be marginalized or enumerated separately.
- Known trajectory length: When the user has prior knowledge about the appropriate trajectory length, or when NUTS overhead is a concern.
- Benchmarking: HMC with fixed trajectory length provides a useful baseline for comparing against NUTS.
- Simple models: For low-dimensional models where the trajectory length is easy to tune manually.
For most practical applications, NUTS (which automatically tunes the trajectory length) is preferred over manually-tuned HMC.
Theoretical Basis
Hamiltonian Mechanics
The Hamiltonian H(theta, r) = U(theta) + K(r) defines a system of ordinary differential equations:
d theta / dt = partial H / partial r = M^{-1} rd r / dt = - partial H / partial theta = - grad U(theta)
These dynamics have three key properties that make them suitable for MCMC:
- Energy conservation:
H(theta(t), r(t))is constant along the trajectory, ensuring high acceptance probability. - Volume preservation: The Jacobian determinant of the map is 1, so no correction is needed in the acceptance ratio.
- Time reversibility: The dynamics can be reversed by negating the momentum, which is essential for satisfying detailed balance.
Scaling with Dimensionality
For a d-dimensional target distribution, HMC requires O(d^{1/4}) gradient evaluations per effective sample (under optimal tuning), compared to O(d) for random walk Metropolis. This dramatic improvement in scaling makes HMC practical for high-dimensional posterior distributions.
Mass Matrix Preconditioning
The mass matrix M plays the role of a preconditioner. Setting M to the posterior covariance ensures that the Hamiltonian dynamics explore all directions at comparable rates. In Pyro:
- Diagonal mass matrix (
full_mass=False): Adapts the marginal variances. Efficient and sufficient when parameters are approximately uncorrelated. - Dense mass matrix (
full_mass=True): Adapts the full covariance. Necessary when parameters have strong posterior correlations, but scales asO(d^2)in memory.