Principle:Pyro ppl Pyro Stochastic Variational Inference
Metadata
| Field | Value |
|---|---|
| Page Type | Principle |
| Knowledge Sources | Paper (Stochastic Variational Inference), Paper (Auto-Encoding Variational Bayes), Repo (Pyro) |
| Domains | Variational_Inference, Bayesian_Inference |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
Stochastic Variational Inference (SVI) is the core inference algorithm in Pyro that combines a probabilistic model, a variational guide (approximate posterior), an evidence lower bound (ELBO) objective, and a stochastic optimizer to perform scalable approximate Bayesian inference.
Description
SVI is the unified interface for all ELBO-based inference in Pyro. It frames Bayesian inference as an optimization problem: rather than computing the true posterior distribution analytically (which is intractable for most models), SVI searches for the member of a parameterized family of distributions (the guide) that is closest to the true posterior in terms of KL divergence. This is equivalent to maximizing the Evidence Lower Bound (ELBO).
Core Components
SVI brings together four essential components:
- Model: A callable that defines the joint distribution over observed and latent variables using
pyro.sampleandpyro.plateprimitives. The model specifies the generative process -- how latent variables produce observed data.
- Guide: A callable that defines the variational approximation to the posterior distribution. The guide must contain a
pyro.samplestatement for every unobserved sample site in the model, using the same site names. Guide parameters are registered viapyro.paramand are the targets of optimization.
- ELBO (Loss): The evidence lower bound objective function. Pyro provides several ELBO estimators (e.g.,
Trace_ELBO,TraceGraph_ELBO,TraceMeanField_ELBO), each offering different variance-reduction properties for the gradient estimates. The ELBO serves as both a loss function for optimization and a measure of how well the guide approximates the true posterior.
- Optimizer: A wrapped PyTorch optimizer (e.g.,
Adam,ClippedAdam) that updates guide parameters based on computed gradients. Pyro'sPyroOptimwrapper manages per-parameter optimizer state dynamically as new parameters are discovered.
The SVI Step
Each call to SVI.step() performs one iteration of stochastic optimization, consisting of five substeps:
- Trace guide: Sample latent variables from the guide distribution and record all sample and parameter sites in an execution trace.
- Replay model against guide: Execute the model, replaying the latent variable values sampled from the guide so that both model and guide are evaluated at the same point in latent space.
- Compute ELBO loss: Using the paired model-guide traces, compute the ELBO estimate as the difference between the model's log-joint probability and the guide's log-probability (negated, since optimizers minimize).
- Backpropagate: Compute gradients of the ELBO loss with respect to all guide parameters and any model parameters registered via
pyro.param. - Optimizer step: Apply the optimizer to update all parameters based on the computed gradients.
This entire cycle is repeated for many iterations until convergence, at which point the guide parameters define the learned approximate posterior.
Convergence and the ELBO
The ELBO provides a lower bound on the log marginal likelihood (log evidence) of the observed data:
- ELBO = E_q[log p(x, z) - log q(z)] <= log p(x)
As training progresses and the ELBO increases, the guide distribution q(z) becomes a better approximation of the true posterior p(z|x). Monitoring the ELBO over training iterations is the standard way to assess convergence.
Usage
SVI is used when:
- Approximate posterior inference: The true posterior is analytically intractable, and a parametric approximation is acceptable. This covers the vast majority of Bayesian models with non-conjugate likelihoods or complex latent structure.
- Scalable Bayesian inference: The dataset is too large for exact methods or MCMC. SVI can operate on mini-batches of data, scaling to arbitrarily large datasets via subsampling within
pyro.platecontexts. - Variational autoencoders (VAEs): The model is a deep generative model with neural network components. SVI with the reparameterization trick enables end-to-end training of both the generative model and the inference network (guide).
- Amortized inference: The guide is parameterized by a neural network that takes observations as input and outputs approximate posterior parameters, enabling rapid inference for new data points without re-optimization.
Theoretical Basis
Evidence Lower Bound (ELBO)
For a model with observed variables x and latent variables z, the marginal log-likelihood decomposes as:
- log p(x) = ELBO + KL(q(z) || p(z|x))
Since KL divergence is non-negative, the ELBO is a lower bound on log p(x). Maximizing the ELBO is equivalent to minimizing KL(q(z) || p(z|x)), which makes q(z) a good approximation to the true posterior.
The ELBO itself can be written as:
- ELBO = E_q(z)[log p(x, z) - log q(z)]
This expectation is estimated via Monte Carlo sampling from the guide q(z).
Stochastic Gradient Estimation
The key insight from Hoffman et al. (2013) is that noisy but unbiased estimates of the ELBO gradient -- obtained from a single sample or mini-batch -- are sufficient for convergence when combined with stochastic gradient descent methods with appropriate learning rate schedules. This makes variational inference scalable to large datasets and complex models.
Reparameterization Trick
For continuous latent variables with reparameterizable distributions, the reparameterization trick (Kingma & Welling, 2014) enables low-variance gradient estimates by expressing the random variable as a deterministic function of its parameters and independent noise. Pyro automatically applies this trick when available, falling back to score function (REINFORCE) estimators otherwise.