Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Pyro ppl Pyro Guide Initialization Strategy

From Leeroopedia




Knowledge Sources
Domains Variational_Inference, Optimization
Last Updated 2026-02-09 09:00 GMT

Overview

Initialization strategy selection for AutoGuide parameters, using cascading fallback chains to robustly initialize variational distributions.

Description

Pyro provides a family of initialization strategies for AutoGuide parameters, organized into a fallback chain: `init_to_mean` falls back to `init_to_median`, which falls back to `init_to_feasible`. Each strategy has specific failure modes that the fallback handles. The `init_scale=0.1` default for guide parameters ensures guides are not overconfident initially, and the `init_to_uniform` strategy draws from [-2, 2] in unconstrained space for good exploration.

Usage

Apply this heuristic when choosing an initialization strategy for AutoGuide-based SVI, or when debugging SVI convergence issues that may stem from poor initialization. Understanding the fallback chain helps diagnose cases where initialization fails silently.

The Insight (Rule of Thumb)

  • Action: Use `init_to_mean` (default) for most problems. Use `init_to_median(num_samples=15)` for heavy-tailed priors (e.g., Cauchy, StudentT). Use `init_to_uniform(radius=2.0)` for exploratory initialization.
  • Value: `init_scale=0.1` for guide uncertainty; `num_samples=15` for empirical median; `radius=2.0` for uniform range in unconstrained space.
  • Trade-off: `init_to_mean` is fast but fails for infinite-variance distributions. `init_to_median` is robust but only works for univariate distributions. `init_to_uniform` provides exploration but may start far from the posterior.
  • Key insight: Median is more robust than mean for heavy-tailed distributions (e.g., Cauchy has no finite mean).

Reasoning

The cascading fallback design ensures that initialization always succeeds. The `init_to_mean` strategy uses the distribution's analytic mean, which is fast but undefined for some distributions (e.g., Cauchy). The `init_to_median` strategy empirically computes the median from 15 samples, which is more robust but only defined for univariate distributions. The `init_to_feasible` strategy always succeeds by projecting a zero vector through the support transform.

The `init_scale=0.1` default prevents the guide from being too confident at initialization. If the guide starts with very small variance, gradient updates may be trapped in a local optimum. Conversely, if the guide starts with very large variance, the ELBO may be dominated by the entropy term.

Code evidence for fallback chain from `pyro/infer/autoguide/initialization.py:102-133`:

def init_to_mean(
    site=None,
    *,
    fallback: Optional[Callable] = init_to_median,
):
    # ...
    try:
        # Try .mean() method.
        value = site["fn"].mean.detach()
        if torch_isnan(value):
            raise ValueError
        # ...
    except (NotImplementedError, ValueError):
        # This may happen for distributions with infinite variance, e.g. Cauchy.
        pass
    if fallback is not None:
        return fallback(site)

Empirical median with 15 samples from `pyro/infer/autoguide/initialization.py:62-99`:

def init_to_median(
    site=None,
    num_samples=15,
    *,
    fallback: Optional[Callable] = init_to_feasible,
):
    # The median undefined for multivariate distributions.
    if _is_multivariate(site["fn"]):
        return init_to_feasible(site)
    try:
        samples = site["fn"].sample(sample_shape=(num_samples,))
        value = samples.median(dim=0)[0]

Uniform initialization in unconstrained space from `pyro/infer/autoguide/initialization.py:136-154`:

def init_to_uniform(
    site: Optional[dict] = None,
    radius: float = 2.0,
):
    # ...
    value = t(torch.rand_like(t.inv(value)) * (2 * radius) - radius)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment