Principle:Pyro ppl Pyro Algebraic Semirings
| Knowledge Sources | |
|---|---|
| Domains | Abstract Algebra, Dynamic Programming, Probabilistic Inference |
| Last Updated | 2026-02-09 09:00 GMT |
Overview
Algebraic semirings provide a unified framework for sum-product algorithms by abstracting the "addition" and "multiplication" operations, enabling the same algorithmic structure to compute probabilities, gradients, Viterbi paths, and other quantities.
Description
Many inference algorithms in probabilistic models share the same algorithmic structure: they compute a quantity by combining local factors through sum and product operations over a graphical structure. The key insight of the semiring approach is that different computations correspond to different choices of what "sum" and "product" mean.
A semiring (S, +, *, 0, 1) is an algebraic structure with:
- A set S of values
- An "addition" operation + (commutative, associative, with identity 0)
- A "multiplication" operation * (associative, with identity 1)
- Multiplication distributes over addition: a * (b + c) = a*b + a*c
By choosing different semirings, the same algorithm computes different quantities:
- Real semiring (R, +, *, 0, 1): Standard sum-product computes marginal probabilities.
- Log semiring (R union {-inf}, logsumexp, +, -inf, 0): Numerically stable version of the real semiring, operating in log-space.
- Max-plus (tropical) semiring (R union {-inf}, max, +, -inf, 0): Computes the Viterbi (most likely) path.
- Boolean semiring ({0,1}, OR, AND, 0, 1): Computes reachability/satisfiability.
- Expectation semiring: Computes expected values (useful for computing gradients of log-partition functions).
This abstraction is particularly powerful for:
- Hidden Markov Models: Forward-backward (marginals) and Viterbi (MAP) use the same algorithm with different semirings.
- Context-free grammars: Inside-outside (marginals) and CYK (best parse) are semiring-parameterized.
- Factor graphs: Sum-product and max-product belief propagation differ only in the semiring.
Usage
Use algebraic semirings when:
- Implementing dynamic programming algorithms that need to compute different quantities (marginals, MAP, gradients).
- Building generic sum-product engines for graphical model inference.
- You want a single implementation that can be reused for multiple inference tasks.
- Computing gradients of partition functions via the expectation semiring.
- Working with HMMs, PCFGs, or other structured probabilistic models.
Theoretical Basis
Semiring definition:
# A semiring (S, oplus, otimes, zero, one):
# oplus: S x S -> S (addition)
# otimes: S x S -> S (multiplication)
# zero: identity for oplus
# one: identity for otimes
# Axioms:
# (S, oplus, zero) is a commutative monoid
# (S, otimes, one) is a monoid
# otimes distributes over oplus:
# a otimes (b oplus c) = (a otimes b) oplus (a otimes c)
# zero is absorbing for otimes:
# a otimes zero = zero otimes a = zero
Sum-product algorithm parameterized by semiring:
# Generic variable elimination:
# Given factors f_1(x_1, x_2), f_2(x_2, x_3), ..., f_n(x_n, x_1)
# To eliminate variable x_j:
# Replace all factors involving x_j with a single new factor:
# f_new(neighbors(x_j)) = OPLUS_{x_j} OTIMES_{f involving x_j} f(...)
# Real semiring: OPLUS = sum, OTIMES = product -> marginals
# Tropical semiring: OPLUS = max, OTIMES = + -> MAP/Viterbi
# Log semiring: OPLUS = logsumexp, OTIMES = + -> log-marginals
HMM example:
# Forward algorithm for HMM with states s, emissions x:
# alpha_t(s) = OPLUS_{s'} alpha_{t-1}(s') OTIMES transition(s', s) OTIMES emission(s, x_t)
# Real semiring: alpha_t(s) = sum_{s'} alpha_{t-1}(s') * p(s|s') * p(x_t|s)
# -> forward probabilities for marginal inference
# Tropical semiring: alpha_t(s) = max_{s'} alpha_{t-1}(s') + log p(s|s') + log p(x_t|s)
# -> Viterbi scores for MAP inference
Expectation semiring (for gradient computation):
# Values are pairs (p, r) where p is probability, r is expected value
# (p1, r1) oplus (p2, r2) = (p1 + p2, r1 + r2)
# (p1, r1) otimes (p2, r2) = (p1 * p2, p1 * r2 + p2 * r1)
# zero = (0, 0), one = (1, 0)
# Running forward-backward in the expectation semiring computes
# both the partition function and its gradient in a single pass