Principle:Dotnet Machinelearning Alias Method Sampling

Knowledge Sources	Fast Generation of Discrete Random Variables A Linear Algorithm For Generating Random Numbers With a Given Distribution
Domains	Sampling, Random_Number_Generation, Topic_Modeling, Algorithms
Last Updated	2026-02-09 12:00 GMT

Overview

The alias method (Marsaglia/Vose) enables O(1) sampling from an arbitrary discrete distribution over K outcomes after O(K) preprocessing, by decomposing the distribution into K equal-probability bins each containing at most two outcomes.

Description

Sampling from a categorical distribution over K outcomes with arbitrary probabilities is a fundamental operation in probabilistic algorithms. The naive approach (computing the CDF and performing binary search) requires O(log K) per sample. The alias method achieves O(1) per sample at the cost of O(K) preprocessing.

The key insight is that any discrete distribution over K outcomes can be represented as a mixture of K uniform distributions, each covering exactly 1/K of the total probability mass. Each bin is split between at most two outcomes: the "resident" outcome and an "alias" outcome. To sample, one draws a random bin index uniformly and then flips a biased coin to choose between the resident and the alias.

Historical context:

Walker (1977) introduced the alias method with O(K^2) construction.
Vose (1991) improved construction to O(K) using two-queue partitioning.
Marsaglia (2004) described an efficient integer-arithmetic variant, which is the version implemented in ML.NET's AliasMultinomialRNGInt.

Why it matters for LDA: In LightLDA-style Gibbs sampling, each token requires sampling a topic from a word-specific distribution P(k|w). With K topics and V words, building and querying these distributions millions of times per iteration makes O(1) sampling essential. The alias method reduces sampling cost from O(K) (direct multinomial) or O(log K) (binary search on CDF) to O(1).

Usage

Use the alias method when:

You need to draw many samples from the same discrete distribution
The distribution is static or changes infrequently relative to the number of samples
K is large enough that O(log K) per sample is a bottleneck
The O(K) preprocessing cost is amortized across many samples

In the ML.NET LDA implementation, alias tables are rebuilt once per iteration (O(K) per word, O(V*K) total) and then sampled O(num_tokens * mh_steps) times.

Theoretical Basis

The Alias Table Structure

Given a discrete distribution $P = (p_{0}, p_{1}, \dots, p_{K - 1})$ with $\sum_{i} p_{i} = 1$ , the alias table consists of K entries, each with:

Resident index: i (the bin's "owner")
Alias index: a_i (alternative outcome)
Boundary value: v_i (threshold for choosing resident vs. alias)

The table satisfies: $\sum_{i = 0}^{K - 1} P (bin i selects j) = p_{j}$ for all j.

Construction Algorithm (Vose)

Algorithm: Vose Alias Table Construction

Input: probabilities p[0..K-1] with sum = 1
Output: alias table (alias[0..K-1], boundary[0..K-1])

1. Set average = 1/K
2. Create two queues: L (light) and H (heavy)
3. For each i: if p[i] < average, add i to L; else add i to H
4. While L and H are both non-empty:
   a. Remove i from L, j from H
   b. Set alias[i] = j
   c. Set boundary[i] = K * p[i]
   d. Set p[j] = p[j] + p[i] - average   (redistribute excess)
   e. If p[j] < average: add j to L; else add j to H
5. For remaining items in L or H:
   Set alias[i] = i, boundary[i] = 1  (self-referencing, numerically safe)

Sampling Algorithm

Algorithm: Alias Table Sampling

Input: alias table (alias[0..K-1], boundary[0..K-1])
Output: sampled index in [0, K)

1. Draw uniform random u in [0, K)
2. Let i = floor(u), f = frac(u)
3. If f < boundary[i]: return i
4. Else: return alias[i]

Both steps use only a single random number and O(1) table lookups.

Integer Arithmetic Variant (Marsaglia)

The ML.NET implementation uses integer arithmetic for improved numerical stability:

Integer Alias Sampling:
1. Set mass_int = 0x7FFFFFFF (max 31-bit integer)
2. Set a_int = mass_int / K  (uniform height per bin)
3. Scale all proportions to integers summing to mass_int
4. Build alias table using Vose algorithm on integer proportions
5. To sample: draw random 32-bit integer r
   - bin = r / a_int
   - Look up (alias_k, boundary_v) for bin
   - If r < boundary_v: return bin
   - Else: return alias_k

The integer version avoids floating-point division during sampling, using only integer division and comparison.

Branchless Sampling

The implementation uses a branchless technique to avoid branch misprediction:

// Standard: if (sample < v) return idx; else return k;
// Branchless:
int32_t m = -(sample < v);      // m = 0xFFFFFFFF if true, 0 if false
return (idx & m) | (k & ~m);    // select idx if true, k if false

This eliminates branch prediction overhead in tight sampling loops.

Complexity Analysis

Operation	Naive (CDF)	Binary Search	Alias Method
Preprocessing	O(K)	O(K)	O(K)
Per sample	O(K)	O(log K)	O(1)
Memory	O(K)	O(K)	O(K) (2K entries)
N samples total	O(K + NK)	O(K + N log K)	O(K + N)

Mixture Extension for Sparse Distributions

The ML.NET LDA implementation extends the basic alias method for sparse word-topic distributions. Rather than building a K-entry alias table where most entries are zero, it uses a two-component mixture:

$P (k | w) = \frac{n_{k w}}{n_{k w, sum} + β_{mass}} \cdot q_{sparse} (k) + \frac{β_{mass}}{n_{k w, sum} + β_{mass}} \cdot q_{β} (k)$

Where:

$q_{sparse}$ is a small alias table over only the nonzero-count topics (size = number of active topics for word w)
$q_{β}$ is a shared global alias table over all K topics for the beta smoothing term

This reduces memory from O(K) per word to O(nnz_w) per word while maintaining O(1) sampling.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment