Heuristic:Pyro ppl Pyro Guide Initialization Strategy
| Knowledge Sources | |
|---|---|
| Domains | Variational_Inference, Optimization |
| Last Updated | 2026-02-09 09:00 GMT |
Overview
Initialization strategy selection for AutoGuide parameters, using cascading fallback chains to robustly initialize variational distributions.
Description
Pyro provides a family of initialization strategies for AutoGuide parameters, organized into a fallback chain: `init_to_mean` falls back to `init_to_median`, which falls back to `init_to_feasible`. Each strategy has specific failure modes that the fallback handles. The `init_scale=0.1` default for guide parameters ensures guides are not overconfident initially, and the `init_to_uniform` strategy draws from [-2, 2] in unconstrained space for good exploration.
Usage
Apply this heuristic when choosing an initialization strategy for AutoGuide-based SVI, or when debugging SVI convergence issues that may stem from poor initialization. Understanding the fallback chain helps diagnose cases where initialization fails silently.
The Insight (Rule of Thumb)
- Action: Use `init_to_mean` (default) for most problems. Use `init_to_median(num_samples=15)` for heavy-tailed priors (e.g., Cauchy, StudentT). Use `init_to_uniform(radius=2.0)` for exploratory initialization.
- Value: `init_scale=0.1` for guide uncertainty; `num_samples=15` for empirical median; `radius=2.0` for uniform range in unconstrained space.
- Trade-off: `init_to_mean` is fast but fails for infinite-variance distributions. `init_to_median` is robust but only works for univariate distributions. `init_to_uniform` provides exploration but may start far from the posterior.
- Key insight: Median is more robust than mean for heavy-tailed distributions (e.g., Cauchy has no finite mean).
Reasoning
The cascading fallback design ensures that initialization always succeeds. The `init_to_mean` strategy uses the distribution's analytic mean, which is fast but undefined for some distributions (e.g., Cauchy). The `init_to_median` strategy empirically computes the median from 15 samples, which is more robust but only defined for univariate distributions. The `init_to_feasible` strategy always succeeds by projecting a zero vector through the support transform.
The `init_scale=0.1` default prevents the guide from being too confident at initialization. If the guide starts with very small variance, gradient updates may be trapped in a local optimum. Conversely, if the guide starts with very large variance, the ELBO may be dominated by the entropy term.
Code evidence for fallback chain from `pyro/infer/autoguide/initialization.py:102-133`:
def init_to_mean(
site=None,
*,
fallback: Optional[Callable] = init_to_median,
):
# ...
try:
# Try .mean() method.
value = site["fn"].mean.detach()
if torch_isnan(value):
raise ValueError
# ...
except (NotImplementedError, ValueError):
# This may happen for distributions with infinite variance, e.g. Cauchy.
pass
if fallback is not None:
return fallback(site)
Empirical median with 15 samples from `pyro/infer/autoguide/initialization.py:62-99`:
def init_to_median(
site=None,
num_samples=15,
*,
fallback: Optional[Callable] = init_to_feasible,
):
# The median undefined for multivariate distributions.
if _is_multivariate(site["fn"]):
return init_to_feasible(site)
try:
samples = site["fn"].sample(sample_shape=(num_samples,))
value = samples.median(dim=0)[0]
Uniform initialization in unconstrained space from `pyro/infer/autoguide/initialization.py:136-154`:
def init_to_uniform(
site: Optional[dict] = None,
radius: float = 2.0,
):
# ...
value = t(torch.rand_like(t.inv(value)) * (2 * radius) - radius)