Principle:Pyro ppl Pyro Mean Field Variational Inference

Metadata

Field	Value
Page Type	Principle
Knowledge Sources	Paper (Stochastic Variational Inference, Hoffman et al. 2013), Repo (Pyro)
Domains	Bayesian_Inference, Variational_Inference
Last Updated	2026-02-09 12:00 GMT

Overview

Mean-field variational inference approximates the posterior distribution over latent variables with a fully factorized product of independent distributions, typically one Normal distribution per latent variable. This sacrifices the ability to capture posterior correlations in exchange for computational tractability and scalable optimization.

Description

In Bayesian inference, the true posterior distribution p(z|x) over latent variables z given observed data x is generally intractable. Variational inference reformulates posterior computation as an optimization problem: find an approximate distribution q(z) within a tractable family that minimizes the Kullback-Leibler divergence to the true posterior.

The Mean-Field Assumption

The mean-field assumption imposes the strongest factorization: the variational distribution is a product of independent marginals, one per latent variable:

q(z) = product_i q_i(z_i)

Each factor q_i(z_i) is typically chosen to be a Normal (Gaussian) distribution with its own location and scale parameters:

q_i(z_i) = Normal(loc_i, scale_i)

This yields a total of 2d variational parameters for d latent variables (one location and one scale per variable). The independence assumption means that the resulting covariance matrix is diagonal -- all off-diagonal entries are exactly zero.

ELBO Optimization

The parameters {loc_i, scale_i} are optimized by maximizing the Evidence Lower BOund (ELBO):

ELBO(q) = E_q[log p(x, z)] - E_q[log q(z)]

Maximizing the ELBO is equivalent to minimizing KL(q(z) || p(z|x)). Under the mean-field assumption, the ELBO decomposes into per-site terms, making gradient computation particularly efficient.

Tradeoffs

Computational tractability: The number of variational parameters scales linearly as O(d) rather than quadratically as O(d^2) for a full covariance approximation.
Posterior correlations are lost: Because each q_i is independent, the approximation cannot capture correlations between latent variables. This can lead to underestimation of posterior variance and overconfident credible intervals.
Scalability: Mean-field VI combines naturally with stochastic gradient methods and data subsampling, enabling application to large datasets and models with many parameters.
Mode-seeking behavior: Minimizing KL(q || p) tends to place mass on modes of the posterior rather than covering the full posterior mass, which can miss multimodality.

Usage

Mean-field variational inference is applied when:

Full posterior inference is intractable: Models with many latent variables where MCMC sampling would be prohibitively slow.
Approximate uncertainty is sufficient: Applications where marginal uncertainty per variable is more important than joint posterior correlations.
Scalability to large data: When the dataset is too large for batch MCMC methods, stochastic variational inference with mean-field guides enables minibatch training.
Quick prototyping: As a fast baseline approximation before investing in more expressive variational families (e.g., full-rank or normalizing flows).

In Pyro, the mean-field assumption is implemented by the AutoNormal guide, which automatically constructs an independent Normal variational distribution for each latent site discovered in the model.

Theoretical Basis

Variational Inference Framework

Given a model with joint distribution p(x, z) = p(x|z) p(z), the log marginal likelihood satisfies:

log p(x) = ELBO(q) + KL(q(z) || p(z|x))

Since the KL divergence is non-negative, the ELBO is a lower bound on the log evidence. Maximizing the ELBO tightens this bound and drives q(z) closer to the true posterior.

Mean-Field Factorization

The mean-field family restricts q to fully factorized distributions:

Q_MF = { q : q(z) = product_{i=1}^{d} q_i(z_i) }

When each factor is Gaussian, this yields:

q(z) = product_{i=1}^{d} Normal(z_i; loc_i, scale_i)

The entropy of this factorized distribution is the sum of individual entropies:

H[q] = sum_{i=1}^{d} H[q_i] = sum_{i=1}^{d} (1/2) log(2 * pi * e * scale_i^2)

This additive decomposition makes gradient computation straightforward and enables per-site parameter updates.

Stochastic Variational Inference

Hoffman et al. (2013) showed that the mean-field ELBO can be optimized using stochastic gradient ascent with noisy gradient estimates obtained from minibatches of data. The key insight is that the ELBO gradient with respect to variational parameters can be estimated using a small number of Monte Carlo samples from q, combined with the reparameterization trick for continuous latent variables.

This enables mean-field VI to scale to datasets with millions of observations, in contrast to classical coordinate-ascent variational inference which requires full passes over the data.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment