Principle:Pyro ppl Pyro No U Turn Sampling

Metadata

Field	Value
Page Type	Principle
Knowledge Sources	Paper (The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo), Paper (A Conceptual Introduction to Hamiltonian Monte Carlo), Repo (Pyro)
Domains	MCMC, Bayesian_Inference
Last Updated	2026-02-09 12:00 GMT

Overview

The No-U-Turn Sampler (NUTS) is an adaptive extension of Hamiltonian Monte Carlo (HMC) that automatically tunes the trajectory length, eliminating the need to hand-set the num_steps parameter while maintaining or improving sampling efficiency.

Description

NUTS builds on the foundation of Hamiltonian Monte Carlo by introducing a recursive tree-doubling algorithm that dynamically determines how far to simulate Hamiltonian dynamics at each iteration. In standard HMC, the user must specify the number of leapfrog integration steps, which critically affects sampler performance: too few steps result in random-walk behavior; too many waste computation and can cause trajectories to loop back on themselves.

Recursive Tree Doubling

NUTS constructs a binary tree of leapfrog states by repeatedly doubling the trajectory in a randomly chosen direction (forward or backward in time). At each doubling:

A new subtree of the same depth as the existing tree is proposed by simulating leapfrog steps.
The algorithm checks whether the trajectory has begun to "turn back" -- that is, whether the U-turn criterion is satisfied.
If a U-turn is detected, or if a numerical divergence occurs, the tree-building process terminates.

The tree depth is bounded by a maximum depth parameter (typically 10, corresponding to up to 1024 leapfrog steps) to prevent runaway computation.

U-Turn Criterion

The U-turn condition detects when a trajectory begins to double back on itself. The original NUTS criterion checks whether the dot product between the momentum and the displacement vector (from the leftmost to the rightmost state in the trajectory) becomes negative. Formally, a U-turn is detected when:

dot(p, q_right - q_left) < 0 or dot(p, q_left - q_right) < 0

where p is the momentum and q_left, q_right are the position endpoints of the trajectory. This criterion ensures that the sampler explores as much of the energy level set as possible without redundantly retracing its path.

Multinomial Sampling

The original NUTS paper used slice sampling to select the return state from the trajectory. A more efficient variant, implemented in Pyro, uses multinomial sampling weighted by the unnormalized probability (energy) of each state along the trajectory. This approach yields lower variance estimators and better exploration of the target distribution.

Automatic Adaptation

NUTS includes two adaptation mechanisms during the warmup phase:

Step size adaptation: Uses dual averaging (Nesterov, 2009) to tune the leapfrog step size to achieve a target acceptance probability (typically 0.8). The step size is adjusted so that the sampler neither rejects too many proposals (step size too large) nor takes unnecessarily small steps (step size too small).

Mass matrix adaptation: Uses the Welford online algorithm to estimate the empirical covariance (or diagonal variance) of the target distribution from warmup samples. The mass matrix preconditions the Hamiltonian dynamics so that the sampler can efficiently explore distributions with different scales along different dimensions.

Usage

NUTS is the recommended MCMC kernel for most continuous posterior distributions in Pyro:

Default MCMC kernel: NUTS should be the first choice for sampling from continuous posterior distributions. It requires minimal tuning compared to standard HMC.
High-dimensional models: NUTS efficiently handles models with tens to hundreds of continuous parameters, where random-walk Metropolis would mix extremely slowly.
Models with varying curvature: The adaptive trajectory length allows NUTS to take short trajectories in regions of high curvature and long trajectories in flatter regions, adapting to local geometry.
When HMC trajectory length is difficult to tune: If the optimal number of leapfrog steps is unknown or varies across the parameter space, NUTS automatically adapts.

Theoretical Basis

Hamiltonian Dynamics

NUTS extends HMC, which augments the target distribution p(theta | data) with auxiliary momentum variables r. The joint distribution defines a Hamiltonian:

H(theta, r) = -log p(theta | data) + 0.5 * r^T M^{-1} r

where M is the mass matrix. Hamiltonian dynamics preserve this energy, and the leapfrog integrator provides a symplectic (energy-preserving) numerical approximation.

Optimal Trajectory Length

For a quadratic target (multivariate normal), the optimal trajectory length is half the period of the oscillation in the Hamiltonian system, approximately pi * sqrt(lambda_max / lambda_min) where lambda are the eigenvalues of the covariance matrix. NUTS approximates this optimal length adaptively by detecting when the trajectory begins to curve back.

Detailed Balance

NUTS satisfies detailed balance with respect to the target distribution through a carefully constructed accept/reject mechanism that accounts for the variable trajectory length. The multinomial sampling variant maintains this property while providing lower-variance estimates.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment