Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Pyro ppl Pyro Combinatorial Distributions

From Leeroopedia


Knowledge Sources
Domains Combinatorics, Discrete Optimization, Probabilistic Modeling
Last Updated 2026-02-09 09:00 GMT

Overview

Combinatorial distributions define probability distributions over discrete combinatorial structures such as permutations, matchings, and partitions, enabling Bayesian reasoning about structured discrete objects.

Description

Many problems in machine learning and statistics involve reasoning about combinatorial structures: matchings between entities, orderings of items, partitions of sets, or assignments of labels. Standard continuous distributions are inadequate for these settings because the sample space is inherently discrete and structured.

Combinatorial distributions assign probabilities to elements of a combinatorial set while respecting the constraints that define valid structures. A key challenge is that the number of valid structures grows combinatorially (often factorially) with problem size, making exact enumeration infeasible for large problems.

The one-two matching distribution is a distribution over matchings between two sets of items. In a one-two matching, each element of a set of n items is matched to exactly one or two partners. This arises in:

  • Record linkage: Matching records from two databases where one record may correspond to one or two records in the other database.
  • Bioinformatics: Matching genetic loci where duplications allow one-to-two correspondences.
  • Entity resolution: Determining which mentions in text refer to the same real-world entity.

The distribution over one-two matchings can be parameterized by a score matrix that assigns affinities to potential matches. The probability of a matching is proportional to the product (or sum) of scores of its constituent edges. Computing the normalizing constant (permanent-like quantity) is generally #P-hard, so approximate methods are needed.

Usage

Use combinatorial distributions when:

  • The latent variable has a combinatorial structure (matching, permutation, partition).
  • You need to perform Bayesian inference over structured discrete objects.
  • Modeling record linkage, entity resolution, or data fusion problems.
  • The problem involves assignment constraints (each item matched to exactly one or two others).

Theoretical Basis

Matching distribution:

# A matching M on a bipartite graph (U, V, E) is a set of edges
# such that each node is incident to at most one edge (one-matching)
# or at most two edges (one-two matching).

# Score-based distribution:
p(M | S) = (1/Z) * product_{(i,j) in M} exp(S_{ij})

# where S is an n x m score matrix
# Z = sum over all valid matchings M': product_{(i,j) in M'} exp(S_{ij})

One-two matching specifics:

# In a one-two matching between sets U (size n) and V (size m):
# - Each u in U is matched to exactly 1 or 2 elements of V
# - Each v in V is matched to at most 1 element of U

# The number of valid one-two matchings grows super-exponentially:
# |M| >> n!  for large n, m

# Parameterization:
# S: n x m score matrix (log-affinities)
# For matching M, the unnormalized probability:
# weight(M) = product_{(u,v) in M} exp(S[u, v])

# Sampling uses MCMC or dynamic programming on structured subproblems

Approximate inference strategies:

# 1. Belief propagation on the matching polytope
# 2. MCMC with local move proposals (swap, add, remove edges)
# 3. Variational relaxation to continuous assignment matrices
#    with Sinkhorn normalization for doubly-stochastic constraints

# Variational relaxation:
# Instead of discrete M, optimize continuous P in Birkhoff polytope:
# min KL(q(P) || p(M | S))
# subject to: P is doubly stochastic (rows and columns sum to 1)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment