Principle:Pyro ppl Pyro Coalescent Process
| Knowledge Sources | |
|---|---|
| Domains | Population Genetics, Phylogenetics, Bayesian Inference |
| Last Updated | 2026-02-09 09:00 GMT |
Overview
Kingman's coalescent is a stochastic process that models the genealogical history of a sample of individuals by tracing lineages backward in time until they coalesce into a common ancestor.
Description
The coalescent process is the foundational model in population genetics for describing how a sample of gene copies relates to a common ancestor. Working backward in time from the present, pairs of lineages merge (coalesce) at random times determined by the population size.
Given a sample of n lineages from a population of effective size N_e, the process operates as follows:
- Start with n lineages at the present.
- While there are k > 1 lineages remaining, the time until the next coalescence event is exponentially distributed with rate C(k,2) / N_e, where C(k,2) = k(k-1)/2 is the number of possible pairs.
- At each coalescence, a uniformly random pair of lineages merges into one ancestral lineage.
- The process terminates when a single lineage (the most recent common ancestor) remains.
The coalescent is useful because it provides a likelihood function for observed genetic data given population parameters. By modeling the tree of coalescent times, one can perform Bayesian inference over:
- Effective population size (constant or time-varying)
- Migration rates between subpopulations
- Selection coefficients acting on genetic variants
- Demographic history (population bottlenecks, expansions)
In Pyro, the coalescent is represented as a distribution over coalescent times, which can be used as a prior in hierarchical Bayesian models of genetic data.
Usage
Use the coalescent process when:
- Modeling genealogical relationships among sampled individuals from a population.
- Inferring effective population size or demographic parameters from genetic sequence data.
- Building phylogenetic models where branch lengths are governed by population-genetic processes.
- Combining coalescent priors with mutation models for full Bayesian phylogenetics.
Theoretical Basis
The waiting time between coalescent events follows an exponential distribution:
# Coalescent waiting times
# Given k lineages and effective population size N_e:
# Rate of coalescence (any pair):
lambda_k = C(k, 2) / N_e = k * (k - 1) / (2 * N_e)
# Waiting time until next coalescence:
T_k ~ Exponential(rate=lambda_k)
# Expected waiting time:
E[T_k] = 2 * N_e / (k * (k - 1))
The full coalescent tree for n samples is characterized by times (T_n, T_{n-1}, ..., T_2):
# Joint density of coalescent times
# t = (t_n, t_{n-1}, ..., t_2) where t_k is the waiting time with k lineages
log p(t | N_e) = sum over k=2 to n:
log(lambda_k) - lambda_k * t_k
# where lambda_k = k*(k-1) / (2*N_e)
# Total tree height (time to MRCA):
T_MRCA = sum over k=2 to n: t_k
For a variable population size N_e(t), the coalescent rate becomes time-dependent:
# Variable population size coalescent
# Intensity function:
Lambda(s, t) = integral from s to t of: C(k,2) / N_e(u) du
# Probability of no coalescence in [s, t]:
P(no event in [s,t]) = exp(-Lambda(s, t))