Principle:Pyro ppl Pyro Hamiltonian Monte Carlo

Metadata

Field	Value
Page Type	Principle
Knowledge Sources	Paper (MCMC Using Hamiltonian Dynamics), Repo (Pyro)
Domains	MCMC, Bayesian_Inference
Last Updated	2026-02-09 12:00 GMT

Overview

Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo method that uses Hamiltonian dynamics to propose moves in parameter space, dramatically improving sampling efficiency over random walk Metropolis for high-dimensional continuous distributions.

Description

HMC leverages the mathematics of Hamiltonian mechanics to generate distant, low-correlation proposals that are accepted with high probability. Rather than proposing small random perturbations (as in Metropolis-Hastings), HMC simulates the motion of a fictitious physical system where the parameters are treated as the "position" of a particle and auxiliary "momentum" variables drive the particle along the energy surface defined by the posterior.

Augmented Target Distribution

Given a target posterior p(theta | data), HMC introduces auxiliary momentum variables r drawn from a multivariate normal distribution N(0, M), where M is the mass matrix. The joint distribution is:

p(theta, r) proportional to exp(-H(theta, r))

where the Hamiltonian is:

H(theta, r) = U(theta) + K(r) = -log p(theta | data) + 0.5 * r^T M^{-1} r

The potential energy U(theta) = -log p(theta | data) captures the geometry of the posterior, while the kinetic energy K(r) = 0.5 * r^T M^{-1} r drives the dynamics.

Leapfrog Integrator

Hamiltonian dynamics are simulated using the leapfrog integrator, a symplectic (volume-preserving, time-reversible) numerical integration scheme. For a step size epsilon:

Half-step momentum update: r <- r - (epsilon/2) * grad U(theta)
Full-step position update: theta <- theta + epsilon * M^{-1} * r
Half-step momentum update: r <- r - (epsilon/2) * grad U(theta)

This three-step procedure is repeated num_steps times. The symplectic property ensures that the integrator preserves the Hamiltonian up to discretization error of order O(epsilon^2), enabling high acceptance rates even for large moves.

Metropolis Accept/Reject

After the leapfrog trajectory, a Metropolis accept/reject step corrects for the discretization error of the leapfrog integrator. The proposed state (theta*, r*) is accepted with probability:

min(1, exp(-H(theta*, r*) + H(theta, r)))

Because the leapfrog integrator nearly preserves the Hamiltonian, this acceptance probability is typically close to 1 for well-tuned step sizes.

Key Tuning Parameters

Unlike NUTS, HMC requires manual specification of two critical parameters:

Step size (epsilon): Controls the accuracy of the leapfrog integrator. Too large causes energy drift and low acceptance rates; too small leads to slow exploration. Can be adapted during warmup via dual averaging.
Number of steps (num_steps) or trajectory length (trajectory_length = epsilon * num_steps): Controls how far the trajectory travels. Too short produces correlated samples (random walk behavior); too long wastes computation and can cause U-turns.

Adaptation During Warmup

During the warmup phase, HMC in Pyro can adapt:

Step size: Using the dual averaging algorithm of Nesterov (2009) to target a specified acceptance probability (default 0.8).
Mass matrix: Using the Welford online algorithm to estimate the marginal variances (diagonal mass matrix) or full covariance (dense mass matrix) of the posterior.

Usage

HMC is appropriate when:

Continuous parameters: HMC requires differentiable potential energy, so all sampled parameters must be continuous. Discrete parameters must be marginalized or enumerated separately.
Known trajectory length: When the user has prior knowledge about the appropriate trajectory length, or when NUTS overhead is a concern.
Benchmarking: HMC with fixed trajectory length provides a useful baseline for comparing against NUTS.
Simple models: For low-dimensional models where the trajectory length is easy to tune manually.

For most practical applications, NUTS (which automatically tunes the trajectory length) is preferred over manually-tuned HMC.

Theoretical Basis

Hamiltonian Mechanics

The Hamiltonian H(theta, r) = U(theta) + K(r) defines a system of ordinary differential equations:

d theta / dt = partial H / partial r = M^{-1} r
d r / dt = - partial H / partial theta = - grad U(theta)

These dynamics have three key properties that make them suitable for MCMC:

Energy conservation: H(theta(t), r(t)) is constant along the trajectory, ensuring high acceptance probability.
Volume preservation: The Jacobian determinant of the map is 1, so no correction is needed in the acceptance ratio.
Time reversibility: The dynamics can be reversed by negating the momentum, which is essential for satisfying detailed balance.

Scaling with Dimensionality

For a d-dimensional target distribution, HMC requires O(d^{1/4}) gradient evaluations per effective sample (under optimal tuning), compared to O(d) for random walk Metropolis. This dramatic improvement in scaling makes HMC practical for high-dimensional posterior distributions.

Mass Matrix Preconditioning

The mass matrix M plays the role of a preconditioner. Setting M to the posterior covariance ensures that the Hamiltonian dynamics explore all directions at comparable rates. In Pyro:

Diagonal mass matrix (full_mass=False): Adapts the marginal variances. Efficient and sufficient when parameters are approximately uncorrelated.
Dense mass matrix (full_mass=True): Adapts the full covariance. Necessary when parameters have strong posterior correlations, but scales as O(d^2) in memory.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment