Principle:Pyro ppl Pyro Heavy Tailed Distributions

Knowledge Sources	Stable Distributions: Models for Heavy Tailed Data Reparameterized Gradient Estimators for Stable Distributions A Multivariate Generalization of Student's t-Distribution
Domains	Probability Theory, Robust Statistics, Financial Modeling
Last Updated	2026-02-09 09:00 GMT

Overview

Heavy-tailed distributions assign substantially more probability to extreme values than Gaussian distributions, making them essential for modeling phenomena with outliers, power-law behavior, or infinite variance.

Description

In many real-world settings, data exhibits extreme values far more frequently than a Gaussian model would predict. Heavy-tailed distributions capture this behavior and are characterized by tails that decay slower than exponentially.

Stable distributions form the most general family of heavy-tailed distributions. They are the only possible limiting distributions of normalized sums of i.i.d. random variables (by the generalized central limit theorem). A stable distribution is parameterized by:

alpha (stability index, 0 < alpha <= 2): controls tail heaviness. alpha=2 gives Gaussian; alpha=1 gives Cauchy; smaller alpha means heavier tails.
beta (skewness, -1 <= beta <= 1): controls asymmetry.
mu (location) and sigma (scale).

For alpha < 2, stable distributions have infinite variance; for alpha <= 1, they have infinite mean. Despite lacking closed-form densities in general, they are important in finance, physics, and signal processing.

Multivariate Student's t-distribution generalizes the univariate t-distribution to multiple dimensions. With nu degrees of freedom, it has polynomial tail decay proportional to |x|^{-(nu+d)} where d is the dimension. As nu approaches infinity, it converges to a multivariate Gaussian.

Asymmetric Laplace distribution has exponential tails with different rates on each side, useful for modeling asymmetric heavy-tailed phenomena such as financial returns.

Soft Laplace distribution provides a smooth interpolation between Laplace and Gaussian behavior, offering controllable tail heaviness while maintaining differentiability everywhere.

Affine Beta distribution is a Beta distribution mapped to an arbitrary interval [a, b], useful as a bounded heavy-tailed prior with flexible shape.

Usage

Use heavy-tailed distributions when:

Data contains outliers that would be implausible under Gaussian assumptions.
Modeling financial returns, insurance claims, or natural catastrophe magnitudes.
Building robust regression models where the likelihood should tolerate occasional extreme residuals.
Working with signal processing data that follows power-law or alpha-stable behavior.
Needing a prior that is more diffuse in the tails than a Gaussian (e.g., Student-t priors for robust Bayesian inference).

Theoretical Basis

Stable distributions are defined by their characteristic function:

# Characteristic function of a stable distribution S(alpha, beta, mu, sigma)
# For alpha != 1:
log E[exp(i*t*X)] = i*mu*t - sigma^alpha * |t|^alpha * (1 - i*beta*sign(t)*tan(pi*alpha/2))

# For alpha = 1:
log E[exp(i*t*X)] = i*mu*t - sigma*|t| * (1 + i*beta*(2/pi)*sign(t)*log(|t|))

# Properties:
# alpha = 2: Gaussian (variance = 2*sigma^2)
# alpha = 1, beta = 0: Cauchy
# alpha = 0.5, beta = 1: Levy

Generalized Central Limit Theorem:

# If X_1, X_2, ... are i.i.d. with:
# P(X > x) ~ c_1 * x^{-alpha}  as x -> infinity
# P(X < -x) ~ c_2 * x^{-alpha}  as x -> infinity
# for some 0 < alpha < 2

# Then: (X_1 + ... + X_n - a_n) / b_n -> S(alpha, beta)
# where a_n, b_n are normalizing sequences
# b_n ~ n^{1/alpha}

Multivariate Student's t:

# Density of multivariate t with nu df, location mu, scale matrix Sigma:
p(x | nu, mu, Sigma) =
    Gamma((nu + d)/2) / (Gamma(nu/2) * nu^{d/2} * pi^{d/2} * |Sigma|^{1/2})
    * (1 + (x-mu)^T Sigma^{-1} (x-mu) / nu)^{-(nu+d)/2}

# Tail behavior: p(x) ~ |x|^{-(nu+d)} for |x| -> infinity
# Variance: nu/(nu-2) * Sigma  (exists only for nu > 2)
# As nu -> infinity: converges to Normal(mu, Sigma)

Asymmetric Laplace:

# Asymmetric Laplace with location mu, scale b, asymmetry kappa:
p(x | mu, b, kappa) =
    (kappa / (1 + kappa^2)) * (1/b) *
    exp(-|x - mu| / b * (kappa if x >= mu else 1/kappa))

# Left tail decays as exp(-x/(b*kappa))
# Right tail decays as exp(-x*kappa/b)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment