Principle:Pyro ppl Pyro Prior Specification

Knowledge Sources	Stochastic Variational Inference Pyro: Deep Universal Probabilistic Programming Pyro Distributions
Domains	Bayesian_Inference, Statistics
Last Updated	2026-02-09 00:00 GMT

Overview

A foundational principle for choosing prior distributions that encode domain knowledge and regularize Bayesian models before observing data.

Description

Prior specification is the process of selecting probability distributions for latent parameters before any data is observed. In Bayesian inference, the prior distribution $p (θ)$ represents our beliefs about parameter values prior to seeing data. It combines with the likelihood via Bayes' theorem to produce the posterior distribution.

The choice of prior serves two critical roles:

Domain knowledge encoding: Priors allow the modeler to inject substantive knowledge into the analysis. For example, knowing that a variance parameter must be positive, or that a regression coefficient is likely near zero, can be expressed directly through the prior.
Regularization: Priors act as a form of regularization, preventing overfitting by penalizing extreme parameter values. A Normal(0, 1) prior on a coefficient is analogous to L2 regularization (ridge regression) in frequentist statistics.

Common prior choices follow established conventions:

Normal (location-scale) for unconstrained real-valued parameters such as regression coefficients
HalfCauchy or HalfNormal for scale (standard deviation) parameters that must be positive, where HalfCauchy provides heavier tails allowing for larger values
StudentT (with low degrees of freedom) for robust regression where outlier-resistant likelihoods are needed
Dirichlet for probability vectors (e.g., mixture weights)
Beta for parameters constrained to [0, 1]

The informativeness of a prior lies on a spectrum:

Informative priors concentrate probability mass around specific values, strongly guiding inference (e.g., Normal(5.0, 0.1) says the parameter is very likely near 5)
Weakly informative priors constrain the parameter to a reasonable range without being overly specific (e.g., Normal(0, 10) for a coefficient expected to be moderate)
Uninformative / diffuse priors attempt to let the data speak entirely, though truly uninformative priors can lead to improper posteriors

In Pyro, priors are specified using distribution objects from pyro.distributions (which wraps torch.distributions) and are declared within model functions using pyro.sample statements.

Usage

Use this principle at the start of any Bayesian modeling workflow, when defining the model function. Prior specification is required for every latent variable in the model. Choose priors based on the parameter's domain (real line, positive reals, simplex, etc.), known constraints, and the desired level of regularization. When uncertain, prefer weakly informative priors that encode basic domain knowledge (e.g., a scale parameter is positive and likely moderate) without being overly committal.

Theoretical Basis

In Bayesian inference, the posterior is obtained via Bayes' theorem:

$p (θ | 𝐱) = \frac{p (𝐱 | θ) p (θ)}{p (𝐱)}$

where $p (θ)$ is the prior, $p (𝐱 | θ)$ is the likelihood, and $p (𝐱)$ is the marginal likelihood (evidence). The prior directly influences the posterior: with limited data, the posterior is dominated by the prior; with abundant data, the likelihood dominates and the prior's influence diminishes.

For conjugate models, the prior family determines the posterior family. For example, a Normal prior with a Normal likelihood yields a Normal posterior. In the general (non-conjugate) case addressed by Pyro, the posterior is approximated using variational inference or MCMC.

Pseudo-code:

# Specifying priors in a Bayesian regression model
def model(x, y):
    # Weakly informative prior on coefficients
    beta = sample("beta", Normal(0., 10.))
    # Half-Cauchy prior on noise scale (heavy-tailed, positive)
    sigma = sample("sigma", HalfCauchy(2.))
    # Likelihood connecting parameters to data
    sample("obs", Normal(beta * x, sigma), obs=y)

The prior's effect can be understood through the lens of KL divergence: during variational inference, the ELBO objective includes $- KL (q (θ) ‖ p (θ))$ , which penalizes the variational posterior for deviating from the prior. Stronger (more concentrated) priors impose a larger KL penalty for posterior distributions that stray far from the prior's center.

Related Pages

Implemented By

Implementation:Pyro_ppl_Pyro_Pyro_Distributions

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment