Heuristic:Pyro ppl Pyro MCMC Warmup Adaptation
| Knowledge Sources | |
|---|---|
| Domains | MCMC, Optimization |
| Last Updated | 2026-02-09 09:00 GMT |
Overview
Stan-derived warmup adaptation schedule for MCMC step size and mass matrix tuning, using doubling windows with empirically robust buffer sizes.
Description
Pyro's MCMC warmup adapter uses an adaptation schedule directly derived from Stan's implementation. The warmup period is split into three phases: a start buffer (no mass matrix adaptation), a series of doubling windows (mass matrix estimation via Welford's algorithm), and an end buffer (final cooldown). The doubling window strategy balances early flexibility (short windows) with later stability (long windows for accurate mass matrix estimation).
Usage
Apply this heuristic when configuring MCMC warmup steps for HMC or NUTS. The default values (start=75, end=50, initial_window=25) work well for most problems with >= 150 warmup steps. For smaller warmup budgets (< 20 steps), a single adaptation window is used instead. Understanding these defaults helps diagnose cases where MCMC mixing is poor due to insufficient adaptation.
The Insight (Rule of Thumb)
- Action: Use the Stan-derived default adaptation schedule: start_buffer=75, end_buffer=50, initial_window=25 steps.
- Value: For warmup < 20 steps, use a single window. For 20 <= warmup < 150, scale buffers to 15%/10% of warmup.
- Trade-off: More warmup steps produce better-adapted step size and mass matrix, but increase computation before sampling begins.
- Key parameter: `target_accept_prob=0.8` (80% acceptance rate) is the default for NUTS; increasing it yields smaller step sizes for more thorough exploration at the cost of slower mixing.
- Proxy center: Step size dual averaging initializes at `log(10 * step_size)`, providing a factor-of-10 exploration range.
Reasoning
The adaptation schedule is empirically derived from Stan, which has been tested across thousands of models. The start buffer allows the sampler to find a reasonable region before estimating mass matrix statistics. The doubling window strategy ensures that mass matrix estimates improve with increasing sample size. The end buffer provides a cooldown period where step size is finalized without mass matrix changes, preventing oscillation.
For small warmup budgets, scaling buffers proportionally (15%/10%) prevents the buffers from consuming the entire warmup period. The target acceptance probability of 0.8 is a well-established default that balances exploration (lower acceptance) with computational efficiency (higher acceptance).
Code evidence from `pyro/infer/mcmc/adaptation.py:50-59`:
# We separate warmup_steps into windows:
# start_buffer + window 1 + window 2 + window 3 + ... + end_buffer
# where the length of each window will be doubled for the next window.
# We won't adapt mass matrix during start and end buffers; and mass
# matrix will be updated at the end of each window. This is helpful
# for dealing with the intense computation of sampling momentum from the
# inverse of mass matrix.
self._adapt_start_buffer = 75 # from Stan
self._adapt_end_buffer = 50 # from Stan
self._adapt_initial_window = 25 # from Stan
Small warmup handling from `pyro/infer/mcmc/adaptation.py:67-70`:
# from Stan, for small warmup_steps < 20
if self._warmup_steps < 20:
adaptation_schedule.append(adapt_window(0, self._warmup_steps - 1))
return adaptation_schedule
Dynamic rescaling from `pyro/infer/mcmc/adaptation.py:81-83`:
start_buffer_size = int(0.15 * self._warmup_steps)
end_buffer_size = int(0.1 * self._warmup_steps)
init_window_size = self._warmup_steps - start_buffer_size - end_buffer_size
Step size proxy center from `pyro/infer/mcmc/adaptation.py:112`:
self._step_size_adapt_scheme.prox_center = math.log(10 * self.step_size)