Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Pyro ppl Pyro Prior Specification

From Leeroopedia


Knowledge Sources
Domains Bayesian_Inference, Statistics
Last Updated 2026-02-09 00:00 GMT

Overview

A foundational principle for choosing prior distributions that encode domain knowledge and regularize Bayesian models before observing data.

Description

Prior specification is the process of selecting probability distributions for latent parameters before any data is observed. In Bayesian inference, the prior distribution p(θ) represents our beliefs about parameter values prior to seeing data. It combines with the likelihood via Bayes' theorem to produce the posterior distribution.

The choice of prior serves two critical roles:

  • Domain knowledge encoding: Priors allow the modeler to inject substantive knowledge into the analysis. For example, knowing that a variance parameter must be positive, or that a regression coefficient is likely near zero, can be expressed directly through the prior.
  • Regularization: Priors act as a form of regularization, preventing overfitting by penalizing extreme parameter values. A Normal(0, 1) prior on a coefficient is analogous to L2 regularization (ridge regression) in frequentist statistics.

Common prior choices follow established conventions:

  • Normal (location-scale) for unconstrained real-valued parameters such as regression coefficients
  • HalfCauchy or HalfNormal for scale (standard deviation) parameters that must be positive, where HalfCauchy provides heavier tails allowing for larger values
  • StudentT (with low degrees of freedom) for robust regression where outlier-resistant likelihoods are needed
  • Dirichlet for probability vectors (e.g., mixture weights)
  • Beta for parameters constrained to [0, 1]

The informativeness of a prior lies on a spectrum:

  • Informative priors concentrate probability mass around specific values, strongly guiding inference (e.g., Normal(5.0, 0.1) says the parameter is very likely near 5)
  • Weakly informative priors constrain the parameter to a reasonable range without being overly specific (e.g., Normal(0, 10) for a coefficient expected to be moderate)
  • Uninformative / diffuse priors attempt to let the data speak entirely, though truly uninformative priors can lead to improper posteriors

In Pyro, priors are specified using distribution objects from pyro.distributions (which wraps torch.distributions) and are declared within model functions using pyro.sample statements.

Usage

Use this principle at the start of any Bayesian modeling workflow, when defining the model function. Prior specification is required for every latent variable in the model. Choose priors based on the parameter's domain (real line, positive reals, simplex, etc.), known constraints, and the desired level of regularization. When uncertain, prefer weakly informative priors that encode basic domain knowledge (e.g., a scale parameter is positive and likely moderate) without being overly committal.

Theoretical Basis

In Bayesian inference, the posterior is obtained via Bayes' theorem:

p(θ|𝐱)=p(𝐱|θ)p(θ)p(𝐱)

where p(θ) is the prior, p(𝐱|θ) is the likelihood, and p(𝐱) is the marginal likelihood (evidence). The prior directly influences the posterior: with limited data, the posterior is dominated by the prior; with abundant data, the likelihood dominates and the prior's influence diminishes.

For conjugate models, the prior family determines the posterior family. For example, a Normal prior with a Normal likelihood yields a Normal posterior. In the general (non-conjugate) case addressed by Pyro, the posterior is approximated using variational inference or MCMC.

Pseudo-code:

# Specifying priors in a Bayesian regression model
def model(x, y):
    # Weakly informative prior on coefficients
    beta = sample("beta", Normal(0., 10.))
    # Half-Cauchy prior on noise scale (heavy-tailed, positive)
    sigma = sample("sigma", HalfCauchy(2.))
    # Likelihood connecting parameters to data
    sample("obs", Normal(beta * x, sigma), obs=y)

The prior's effect can be understood through the lens of KL divergence: during variational inference, the ELBO objective includes KL(q(θ)p(θ)), which penalizes the variational posterior for deviating from the prior. Stronger (more concentrated) priors impose a larger KL penalty for posterior distributions that stray far from the prior's center.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment