Principle:Pyro ppl Pyro Robust Likelihood Functions
| Knowledge Sources | |
|---|---|
| Domains | Robust Statistics, Missing Data, Bayesian Inference |
| Last Updated | 2026-02-09 09:00 GMT |
Overview
Robust likelihood functions provide specialized distribution objects for handling missing data, improper priors, data transformations, and non-standard likelihood specifications that arise in practical Bayesian modeling.
Description
Real-world data rarely conforms to the clean assumptions of standard probability distributions. Practical Bayesian modeling requires tools for handling:
NaN-masked distributions: When data contains missing values (NaN entries), standard log-probability computation fails because NaN propagates through arithmetic. NaN-masked distributions automatically detect missing entries and exclude them from the log-likelihood computation, contributing zero log-probability for missing observations while correctly handling present ones.
Improper uniform distribution: Sometimes a prior should express complete ignorance over an unbounded domain. The improper uniform assigns constant density everywhere, which is not a proper probability distribution (it does not integrate to 1). Despite being improper, it is useful as a prior when the likelihood ensures a proper posterior. The implementation returns zero log-probability for any value in the support.
Folded distributions: Given a base distribution on the real line, the folded version maps all values to their absolute values: if X ~ Base, then |X| ~ Folded(Base). This is useful for modeling strictly positive quantities from symmetric distributions (e.g., folded Normal for modeling magnitudes).
Unit distribution: A degenerate distribution that always returns a fixed value with log-probability zero. Used as a placeholder distribution for deterministic computations within probabilistic programs.
Ordered logistic distribution: An ordinal regression likelihood where a continuous latent variable is mapped to ordered categories through a set of cutpoints. This is the standard model for ordinal data (e.g., Likert scale responses).
Truncated Polya-Gamma distribution: A truncated version of the Polya-Gamma distribution used in data augmentation schemes for logistic regression and related models.
Rejector distribution: Implements rejection sampling to sample from a distribution restricted to a subset of its support, or from a distribution whose density is known up to a bound.
Usage
Use these robust likelihood functions when:
- Data contains missing values that should be gracefully ignored in likelihood computation.
- You need a noninformative (improper) prior for a parameter.
- Modeling strictly positive quantities using folded versions of symmetric distributions.
- Working with ordinal outcome data (ordered logistic).
- Implementing custom distributions via rejection sampling.
- Including deterministic values in a probabilistic trace (unit distribution).
Theoretical Basis
NaN-masked log-probability:
# Standard log-prob fails with NaN:
# log p(x | theta) = sum_i log f(x_i | theta) -- NaN if any x_i is NaN
# NaN-masked log-prob:
# mask_i = 1 if x_i is not NaN, 0 otherwise
# log p(x_obs | theta) = sum_i mask_i * log f(x_i | theta)
# where x_i is replaced by a dummy value (e.g., 0) when mask_i = 0
Improper uniform:
# p(theta) = constant for theta in support (possibly unbounded)
# log p(theta) = 0 for all theta in support
# This is valid as a prior when:
# integral p(x | theta) * p(theta) dtheta < infinity
# i.e., the likelihood is integrable
Folded distribution:
# Given base distribution with density f(x):
# Y = |X| has density:
# g(y) = f(y) + f(-y) for y >= 0
# log g(y) = log(exp(log f(y)) + exp(log f(-y)))
# = logsumexp(log f(y), log f(-y))
# Example: Folded Normal (base = Normal(0, sigma)):
# g(y) = 2 * Normal(y | 0, sigma) for y >= 0 (half-normal)
Ordered logistic (cumulative model):
# Latent continuous variable eta (predictor)
# Cutpoints c_1 < c_2 < ... < c_{K-1} for K ordered categories
# P(Y = k | eta) = sigmoid(c_k - eta) - sigmoid(c_{k-1} - eta)
# where sigmoid(x) = 1 / (1 + exp(-x))
# c_0 = -infinity, c_K = +infinity
# This is equivalent to:
# P(Y <= k | eta) = sigmoid(c_k - eta)
Rejection sampling:
# Target density: p(x) on domain D
# Proposal density: q(x) with p(x) <= M * q(x) for all x
# Algorithm:
# 1. Sample x ~ q(x)
# 2. Sample u ~ Uniform(0, 1)
# 3. If u <= p(x) / (M * q(x)): accept x
# Else: reject and go to step 1
# Acceptance probability = 1/M
# Log-prob correction: log p(x) = log p_unnormalized(x) - log Z
Related Pages
- Implementation:Pyro_ppl_Pyro_NanMaskedDistributions
- Implementation:Pyro_ppl_Pyro_ImproperUniform
- Implementation:Pyro_ppl_Pyro_FoldedDistribution
- Implementation:Pyro_ppl_Pyro_Unit
- Implementation:Pyro_ppl_Pyro_OrderedLogistic
- Implementation:Pyro_ppl_Pyro_TruncatedPolyaGamma
- Implementation:Pyro_ppl_Pyro_Rejector