Principle:Pyro ppl Pyro Robust Likelihood Functions

Knowledge Sources	Robust Statistics Improper Priors and Improper Posteriors Folded and Log-Folded Distributions
Domains	Robust Statistics, Missing Data, Bayesian Inference
Last Updated	2026-02-09 09:00 GMT

Overview

Robust likelihood functions provide specialized distribution objects for handling missing data, improper priors, data transformations, and non-standard likelihood specifications that arise in practical Bayesian modeling.

Description

Real-world data rarely conforms to the clean assumptions of standard probability distributions. Practical Bayesian modeling requires tools for handling:

NaN-masked distributions: When data contains missing values (NaN entries), standard log-probability computation fails because NaN propagates through arithmetic. NaN-masked distributions automatically detect missing entries and exclude them from the log-likelihood computation, contributing zero log-probability for missing observations while correctly handling present ones.

Improper uniform distribution: Sometimes a prior should express complete ignorance over an unbounded domain. The improper uniform assigns constant density everywhere, which is not a proper probability distribution (it does not integrate to 1). Despite being improper, it is useful as a prior when the likelihood ensures a proper posterior. The implementation returns zero log-probability for any value in the support.

Folded distributions: Given a base distribution on the real line, the folded version maps all values to their absolute values: if X ~ Base, then |X| ~ Folded(Base). This is useful for modeling strictly positive quantities from symmetric distributions (e.g., folded Normal for modeling magnitudes).

Unit distribution: A degenerate distribution that always returns a fixed value with log-probability zero. Used as a placeholder distribution for deterministic computations within probabilistic programs.

Ordered logistic distribution: An ordinal regression likelihood where a continuous latent variable is mapped to ordered categories through a set of cutpoints. This is the standard model for ordinal data (e.g., Likert scale responses).

Truncated Polya-Gamma distribution: A truncated version of the Polya-Gamma distribution used in data augmentation schemes for logistic regression and related models.

Rejector distribution: Implements rejection sampling to sample from a distribution restricted to a subset of its support, or from a distribution whose density is known up to a bound.

Usage

Use these robust likelihood functions when:

Data contains missing values that should be gracefully ignored in likelihood computation.
You need a noninformative (improper) prior for a parameter.
Modeling strictly positive quantities using folded versions of symmetric distributions.
Working with ordinal outcome data (ordered logistic).
Implementing custom distributions via rejection sampling.
Including deterministic values in a probabilistic trace (unit distribution).

Theoretical Basis

NaN-masked log-probability:

# Standard log-prob fails with NaN:
# log p(x | theta) = sum_i log f(x_i | theta)  -- NaN if any x_i is NaN

# NaN-masked log-prob:
# mask_i = 1 if x_i is not NaN, 0 otherwise
# log p(x_obs | theta) = sum_i mask_i * log f(x_i | theta)
# where x_i is replaced by a dummy value (e.g., 0) when mask_i = 0

Improper uniform:

# p(theta) = constant  for theta in support (possibly unbounded)
# log p(theta) = 0  for all theta in support

# This is valid as a prior when:
# integral p(x | theta) * p(theta) dtheta < infinity
# i.e., the likelihood is integrable

Folded distribution:

# Given base distribution with density f(x):
# Y = |X| has density:
# g(y) = f(y) + f(-y)  for y >= 0

# log g(y) = log(exp(log f(y)) + exp(log f(-y)))
#          = logsumexp(log f(y), log f(-y))

# Example: Folded Normal (base = Normal(0, sigma)):
# g(y) = 2 * Normal(y | 0, sigma)  for y >= 0  (half-normal)

Ordered logistic (cumulative model):

# Latent continuous variable eta (predictor)
# Cutpoints c_1 < c_2 < ... < c_{K-1} for K ordered categories

# P(Y = k | eta) = sigmoid(c_k - eta) - sigmoid(c_{k-1} - eta)
# where sigmoid(x) = 1 / (1 + exp(-x))
# c_0 = -infinity, c_K = +infinity

# This is equivalent to:
# P(Y <= k | eta) = sigmoid(c_k - eta)

Rejection sampling:

# Target density: p(x) on domain D
# Proposal density: q(x) with p(x) <= M * q(x) for all x

# Algorithm:
# 1. Sample x ~ q(x)
# 2. Sample u ~ Uniform(0, 1)
# 3. If u <= p(x) / (M * q(x)): accept x
#    Else: reject and go to step 1

# Acceptance probability = 1/M
# Log-prob correction: log p(x) = log p_unnormalized(x) - log Z

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment