Principle:Pyro ppl Pyro Conjugate Bayesian Updates
| Knowledge Sources | |
|---|---|
| Domains | Bayesian Statistics, Conjugate Priors, Closed-Form Inference |
| Last Updated | 2026-02-09 09:00 GMT |
Overview
Conjugate Bayesian updates exploit the mathematical property that when a prior distribution and likelihood belong to a conjugate family, the posterior has the same distributional form as the prior, enabling closed-form parameter updates.
Description
In Bayesian inference, we seek the posterior distribution p(theta | x) given a prior p(theta) and likelihood p(x | theta). In general, this requires intractable integrals. However, for certain prior-likelihood pairs called conjugate families, the posterior has the same functional form as the prior, differing only in updated parameter values.
A prior p(theta | alpha) is conjugate to a likelihood p(x | theta) if:
p(theta | x) = p(theta | alpha') where alpha' = f(alpha, x)
The updated parameters alpha' can be computed in closed form, avoiding any numerical integration.
Compound conjugate distributions combine a conjugate prior with its likelihood into a single marginal distribution. For example:
- Beta-Binomial: Integrating out the success probability p ~ Beta(alpha, beta) from a Binomial(n, p) likelihood yields the Beta-Binomial distribution.
- Gamma-Poisson (Negative Binomial): Integrating out the rate lambda ~ Gamma(alpha, beta) from a Poisson(lambda) likelihood yields the Negative Binomial.
- Dirichlet-Multinomial: Integrating out the probability vector from a Multinomial likelihood.
- Normal-Inverse-Gamma: Conjugate for a Normal likelihood with unknown mean and variance.
These compound distributions are useful as:
- Marginal likelihoods for model comparison (Bayes factors).
- Predictive distributions that account for parameter uncertainty.
- Collapsed models where nuisance parameters are analytically integrated out, reducing the dimensionality of inference.
Usage
Use conjugate Bayesian updates when:
- The model admits conjugate prior-likelihood pairs, enabling exact posterior computation.
- You want to analytically marginalize out parameters to reduce variance in gradient estimates.
- Computing marginal likelihoods for Bayesian model selection.
- Building hierarchical models where some levels can be collapsed analytically.
- Prototyping models before resorting to approximate inference.
Theoretical Basis
Exponential family conjugacy: Conjugacy arises naturally in exponential families.
# Exponential family likelihood:
p(x | theta) = h(x) * exp(eta(theta) . T(x) - A(theta))
# Conjugate prior:
p(theta | chi, nu) proportional to exp(eta(theta) . chi - nu * A(theta))
# Posterior (also conjugate):
p(theta | x, chi, nu) proportional to exp(eta(theta) . (chi + T(x)) - (nu + 1) * A(theta))
# Update rule:
# chi_new = chi + T(x) (add sufficient statistics)
# nu_new = nu + 1 (increment pseudo-count)
Key conjugate pairs:
# Beta-Binomial:
# Prior: p ~ Beta(alpha, beta)
# Likelihood: x | p ~ Binomial(n, p)
# Posterior: p | x ~ Beta(alpha + x, beta + n - x)
# Marginal: x ~ BetaBinomial(n, alpha, beta)
# Gamma-Poisson:
# Prior: lambda ~ Gamma(alpha, beta)
# Likelihood: x | lambda ~ Poisson(lambda)
# Posterior: lambda | x ~ Gamma(alpha + x, beta + 1)
# Marginal: x ~ NegativeBinomial(alpha, beta/(beta+1))
# Normal-Normal (known variance sigma^2):
# Prior: mu ~ Normal(mu_0, sigma_0^2)
# Likelihood: x | mu ~ Normal(mu, sigma^2)
# Posterior: mu | x ~ Normal(mu_n, sigma_n^2)
# where:
# sigma_n^2 = 1 / (1/sigma_0^2 + n/sigma^2)
# mu_n = sigma_n^2 * (mu_0/sigma_0^2 + n*x_bar/sigma^2)
# Dirichlet-Multinomial:
# Prior: pi ~ Dirichlet(alpha_1, ..., alpha_K)
# Likelihood: x | pi ~ Multinomial(n, pi)
# Posterior: pi | x ~ Dirichlet(alpha_1 + x_1, ..., alpha_K + x_K)
Marginal likelihood from conjugacy:
# p(x) = integral p(x|theta) p(theta|alpha) dtheta
# = Z(alpha') / Z(alpha) * p_base(x)
# where Z(alpha) is the normalizing constant of the prior
# This ratio is available in closed form for conjugate families