Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Pyro ppl Pyro Conjugate Bayesian Updates

From Leeroopedia


Knowledge Sources
Domains Bayesian Statistics, Conjugate Priors, Closed-Form Inference
Last Updated 2026-02-09 09:00 GMT

Overview

Conjugate Bayesian updates exploit the mathematical property that when a prior distribution and likelihood belong to a conjugate family, the posterior has the same distributional form as the prior, enabling closed-form parameter updates.

Description

In Bayesian inference, we seek the posterior distribution p(theta | x) given a prior p(theta) and likelihood p(x | theta). In general, this requires intractable integrals. However, for certain prior-likelihood pairs called conjugate families, the posterior has the same functional form as the prior, differing only in updated parameter values.

A prior p(theta | alpha) is conjugate to a likelihood p(x | theta) if:

p(theta | x) = p(theta | alpha') where alpha' = f(alpha, x)

The updated parameters alpha' can be computed in closed form, avoiding any numerical integration.

Compound conjugate distributions combine a conjugate prior with its likelihood into a single marginal distribution. For example:

  • Beta-Binomial: Integrating out the success probability p ~ Beta(alpha, beta) from a Binomial(n, p) likelihood yields the Beta-Binomial distribution.
  • Gamma-Poisson (Negative Binomial): Integrating out the rate lambda ~ Gamma(alpha, beta) from a Poisson(lambda) likelihood yields the Negative Binomial.
  • Dirichlet-Multinomial: Integrating out the probability vector from a Multinomial likelihood.
  • Normal-Inverse-Gamma: Conjugate for a Normal likelihood with unknown mean and variance.

These compound distributions are useful as:

  • Marginal likelihoods for model comparison (Bayes factors).
  • Predictive distributions that account for parameter uncertainty.
  • Collapsed models where nuisance parameters are analytically integrated out, reducing the dimensionality of inference.

Usage

Use conjugate Bayesian updates when:

  • The model admits conjugate prior-likelihood pairs, enabling exact posterior computation.
  • You want to analytically marginalize out parameters to reduce variance in gradient estimates.
  • Computing marginal likelihoods for Bayesian model selection.
  • Building hierarchical models where some levels can be collapsed analytically.
  • Prototyping models before resorting to approximate inference.

Theoretical Basis

Exponential family conjugacy: Conjugacy arises naturally in exponential families.

# Exponential family likelihood:
p(x | theta) = h(x) * exp(eta(theta) . T(x) - A(theta))

# Conjugate prior:
p(theta | chi, nu) proportional to exp(eta(theta) . chi - nu * A(theta))

# Posterior (also conjugate):
p(theta | x, chi, nu) proportional to exp(eta(theta) . (chi + T(x)) - (nu + 1) * A(theta))

# Update rule:
# chi_new = chi + T(x)    (add sufficient statistics)
# nu_new  = nu + 1         (increment pseudo-count)

Key conjugate pairs:

# Beta-Binomial:
# Prior: p ~ Beta(alpha, beta)
# Likelihood: x | p ~ Binomial(n, p)
# Posterior: p | x ~ Beta(alpha + x, beta + n - x)
# Marginal: x ~ BetaBinomial(n, alpha, beta)

# Gamma-Poisson:
# Prior: lambda ~ Gamma(alpha, beta)
# Likelihood: x | lambda ~ Poisson(lambda)
# Posterior: lambda | x ~ Gamma(alpha + x, beta + 1)
# Marginal: x ~ NegativeBinomial(alpha, beta/(beta+1))

# Normal-Normal (known variance sigma^2):
# Prior: mu ~ Normal(mu_0, sigma_0^2)
# Likelihood: x | mu ~ Normal(mu, sigma^2)
# Posterior: mu | x ~ Normal(mu_n, sigma_n^2)
# where:
#   sigma_n^2 = 1 / (1/sigma_0^2 + n/sigma^2)
#   mu_n = sigma_n^2 * (mu_0/sigma_0^2 + n*x_bar/sigma^2)

# Dirichlet-Multinomial:
# Prior: pi ~ Dirichlet(alpha_1, ..., alpha_K)
# Likelihood: x | pi ~ Multinomial(n, pi)
# Posterior: pi | x ~ Dirichlet(alpha_1 + x_1, ..., alpha_K + x_K)

Marginal likelihood from conjugacy:

# p(x) = integral p(x|theta) p(theta|alpha) dtheta
#       = Z(alpha') / Z(alpha) * p_base(x)

# where Z(alpha) is the normalizing constant of the prior
# This ratio is available in closed form for conjugate families

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment