Principle:Dotnet Machinelearning Alias Method Sampling
| Knowledge Sources | |
|---|---|
| Domains | Sampling, Random_Number_Generation, Topic_Modeling, Algorithms |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
The alias method (Marsaglia/Vose) enables O(1) sampling from an arbitrary discrete distribution over K outcomes after O(K) preprocessing, by decomposing the distribution into K equal-probability bins each containing at most two outcomes.
Description
Sampling from a categorical distribution over K outcomes with arbitrary probabilities is a fundamental operation in probabilistic algorithms. The naive approach (computing the CDF and performing binary search) requires O(log K) per sample. The alias method achieves O(1) per sample at the cost of O(K) preprocessing.
The key insight is that any discrete distribution over K outcomes can be represented as a mixture of K uniform distributions, each covering exactly 1/K of the total probability mass. Each bin is split between at most two outcomes: the "resident" outcome and an "alias" outcome. To sample, one draws a random bin index uniformly and then flips a biased coin to choose between the resident and the alias.
Historical context:
- Walker (1977) introduced the alias method with O(K^2) construction.
- Vose (1991) improved construction to O(K) using two-queue partitioning.
- Marsaglia (2004) described an efficient integer-arithmetic variant, which is the version implemented in ML.NET's AliasMultinomialRNGInt.
Why it matters for LDA: In LightLDA-style Gibbs sampling, each token requires sampling a topic from a word-specific distribution P(k|w). With K topics and V words, building and querying these distributions millions of times per iteration makes O(1) sampling essential. The alias method reduces sampling cost from O(K) (direct multinomial) or O(log K) (binary search on CDF) to O(1).
Usage
Use the alias method when:
- You need to draw many samples from the same discrete distribution
- The distribution is static or changes infrequently relative to the number of samples
- K is large enough that O(log K) per sample is a bottleneck
- The O(K) preprocessing cost is amortized across many samples
In the ML.NET LDA implementation, alias tables are rebuilt once per iteration (O(K) per word, O(V*K) total) and then sampled O(num_tokens * mh_steps) times.
Theoretical Basis
The Alias Table Structure
Given a discrete distribution with , the alias table consists of K entries, each with:
- Resident index: i (the bin's "owner")
- Alias index: a_i (alternative outcome)
- Boundary value: v_i (threshold for choosing resident vs. alias)
The table satisfies: for all j.
Construction Algorithm (Vose)
Algorithm: Vose Alias Table Construction
Input: probabilities p[0..K-1] with sum = 1
Output: alias table (alias[0..K-1], boundary[0..K-1])
1. Set average = 1/K
2. Create two queues: L (light) and H (heavy)
3. For each i: if p[i] < average, add i to L; else add i to H
4. While L and H are both non-empty:
a. Remove i from L, j from H
b. Set alias[i] = j
c. Set boundary[i] = K * p[i]
d. Set p[j] = p[j] + p[i] - average (redistribute excess)
e. If p[j] < average: add j to L; else add j to H
5. For remaining items in L or H:
Set alias[i] = i, boundary[i] = 1 (self-referencing, numerically safe)
Sampling Algorithm
Algorithm: Alias Table Sampling
Input: alias table (alias[0..K-1], boundary[0..K-1])
Output: sampled index in [0, K)
1. Draw uniform random u in [0, K)
2. Let i = floor(u), f = frac(u)
3. If f < boundary[i]: return i
4. Else: return alias[i]
Both steps use only a single random number and O(1) table lookups.
Integer Arithmetic Variant (Marsaglia)
The ML.NET implementation uses integer arithmetic for improved numerical stability:
Integer Alias Sampling:
1. Set mass_int = 0x7FFFFFFF (max 31-bit integer)
2. Set a_int = mass_int / K (uniform height per bin)
3. Scale all proportions to integers summing to mass_int
4. Build alias table using Vose algorithm on integer proportions
5. To sample: draw random 32-bit integer r
- bin = r / a_int
- Look up (alias_k, boundary_v) for bin
- If r < boundary_v: return bin
- Else: return alias_k
The integer version avoids floating-point division during sampling, using only integer division and comparison.
Branchless Sampling
The implementation uses a branchless technique to avoid branch misprediction:
// Standard: if (sample < v) return idx; else return k;
// Branchless:
int32_t m = -(sample < v); // m = 0xFFFFFFFF if true, 0 if false
return (idx & m) | (k & ~m); // select idx if true, k if false
This eliminates branch prediction overhead in tight sampling loops.
Complexity Analysis
| Operation | Naive (CDF) | Binary Search | Alias Method |
|---|---|---|---|
| Preprocessing | O(K) | O(K) | O(K) |
| Per sample | O(K) | O(log K) | O(1) |
| Memory | O(K) | O(K) | O(K) (2K entries) |
| N samples total | O(K + NK) | O(K + N log K) | O(K + N) |
Mixture Extension for Sparse Distributions
The ML.NET LDA implementation extends the basic alias method for sparse word-topic distributions. Rather than building a K-entry alias table where most entries are zero, it uses a two-component mixture:
Where:
- is a small alias table over only the nonzero-count topics (size = number of active topics for word w)
- is a shared global alias table over all K topics for the beta smoothing term
This reduces memory from O(K) per word to O(nnz_w) per word while maintaining O(1) sampling.