Implementation:Dotnet Machinelearning AliasMultinomialRng
| Knowledge Sources | |
|---|---|
| Domains | Sampling, Topic_Modeling, Random_Number_Generation |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
AliasMultinomialRNGInt implements O(1) sampling from arbitrary discrete distributions using the Marsaglia/Vose alias method with integer arithmetic for numerical stability.
Description
The AliasMultinomialRNGInt class (in the wood namespace) provides an efficient mechanism for drawing random samples from a categorical distribution over K outcomes. The alias method preprocesses the distribution in O(K) time, constructing an alias table that enables each subsequent sample in O(1) time. This is critical for LDA performance because the Gibbs sampler must repeatedly draw topic proposals from word-topic and document-topic distributions.
Key design decisions:
- Integer arithmetic: All proportions are scaled to 32-bit integers (mass_int_ = 0x7fffffff normalized to be divisible by K), avoiding floating-point precision issues. The uniform "height" per bin is a_int_ = mass_int_ / n_.
- Three SetProportionMass overloads:
- Overload 1 (vector<alias_k_v>& output): Builds the alias table into a vector of alias_k_v structs (each containing alias index k_ and boundary value v_). Used for the global beta smoothing alias table.
- Overload 2 (int32_t* memory output): Builds the alias table into a flat memory block (pairs of [k, v] as int32_t). Used for per-word dense alias tables.
- Overload 3 (int32_t* memory, int32_t size): Same as overload 2 but takes an explicit size parameter for sparse word-topic distributions where not all K topics are present.
- Mass correction: After integer conversion, the total mass may not exactly equal mass_int_ due to rounding. The code adjusts by incrementing or decrementing individual proportions in round-robin fashion until the totals match.
- L/H queue construction: Uses two queues (L_ for "light" bins below average, H_ for "heavy" bins above) to pair underfull with overfull categories, the standard Vose algorithm.
Sampling (Next method):
- Draw a uniform random integer sample from the xorshift_rng.
- Compute bin index: idx = sample / height_.
- Look up k_ (alias) and v_ (boundary) for that bin.
- If sample < v_, return idx; otherwise return k_. This is done branchlessly: m = -(sample < v); return (idx & m) | (k & ~m).
Supporting struct:
struct alias_k_v {
int32_t k_; // alias index
int32_t v_; // boundary value
};
Usage
Used by LdaEngine to build the global beta smoothing alias table (shared across all words), and by hybrid_alias_map::build_table() to construct per-word alias tables for the word-topic proposal distribution. Also used internally by LightDocSampler for topic proposal generation.
Code Reference
Source Location
- Repository: Dotnet_Machinelearning
- File: src/Native/LdaNative/alias_multinomial_rng_int.hpp (459 lines)
Signature
namespace wood {
struct alias_k_v {
int32_t k_;
int32_t v_;
};
class AliasMultinomialRNGInt {
public:
AliasMultinomialRNGInt();
~AliasMultinomialRNGInt();
void Init(int K);
// Overload 1: Build alias table into vector<alias_k_v>
void SetProportionMass(std::vector<float>& proportion, float mass,
std::vector<alias_k_v>& alias_kv, int32_t* height, xorshift_rng& rng);
// Overload 2: Build alias table into flat int32_t* memory
void SetProportionMass(std::vector<float>& proportion, float mass,
int32_t* memory, int32_t* height, xorshift_rng& rng);
// Overload 3: Build alias table for subset of size entries
void SetProportionMass(std::vector<float>& proportion, int32_t size,
float mass, int32_t* memory, int32_t* height,
xorshift_rng& rng, int32_t word_id);
int32_t Next(xorshift_rng& rng, std::vector<alias_k_v>& alias_kv);
public:
std::vector<int32_t> proportion_int_;
int32_t* internal_memory_;
int32_t n_;
int32_t a_int_; // uniform height per bin
int32_t mass_int_; // total integer mass
std::vector<std::pair<int32_t, int32_t>> L_; // light queue
std::vector<std::pair<int32_t, int32_t>> H_; // heavy queue
};
}
Import
// AliasMultinomialRNGInt is an internal C++ class; not directly exposed via P/Invoke.
// It is used internally by LdaEngine and LightDocSampler for alias table construction.
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| K | int | Yes (Init) | Number of categories (topics). Determines internal buffer sizes. |
| proportion | vector<float>& | Yes | Unnormalized probability weights for each category. Modified in-place (divided by mass). |
| mass | float | Yes | Total sum of the proportion vector before normalization. |
| alias_kv | vector<alias_k_v>& | Yes (Overload 1) | Output vector for alias table entries. |
| memory | int32_t* | Yes (Overloads 2,3) | Output flat memory block for alias table (2 * n entries). |
| height | int32_t* | Yes | Output: uniform bin height a_int_ used for sampling. |
| rng | xorshift_rng& | Yes | Random number generator (currently unused in construction, passed for API consistency). |
| size | int32_t | Yes (Overload 3) | Number of active entries (less than or equal to K) for sparse distributions. |
Outputs
| Name | Type | Description |
|---|---|---|
| alias_kv / memory | alias_k_v[] / int32_t* | Constructed alias table: for each bin i, stores (alias_index, boundary_value). |
| height | int32_t | The uniform height a_int_ = mass_int_ / n_ needed for the sampling step. |
| Next() return | int32_t | Sampled category index in [0, K). |
Algorithm Detail
The Marsaglia/Vose alias method works by decomposing an arbitrary discrete distribution over K outcomes into K equal-width bins, each containing at most two outcomes:
- Scale: Convert float proportions to integers, total = mass_int_ = a_int_ * K.
- Classify: Partition categories into "light" (weight < a_int_) and "heavy" (weight >= a_int_).
- Pair:' For each light category i with weight p_i: assign the remaining space (a_int_ - p_i) to a heavy category j. Set alias_kv[i] = {k=j, v=i*a_int_+p_i}. Reduce js weight by (a_int_ - p_i) and re-classify.
- Sample: Generate uniform random u in [0, mass_int_). Compute bin = u / a_int_. If u < alias_kv[bin].v_, return bin; else return alias_kv[bin].k_.
Usage Examples
// Building the global beta alias table in LdaEngine:
std::vector<float> proportion(K_);
beta_mass_ = 0;
for (int k = 0; k < K_; ++k) {
proportion[k] = beta_ / (global_summary_row_[k] + beta_sum_);
beta_mass_ += proportion[k];
}
alias_rng_int_.SetProportionMass(proportion, beta_mass_,
beta_k_v_, &beta_height_, sampler_.rng());
// Sampling from the alias table:
int32_t topic = alias_rng_int_.Next(rng, beta_k_v_);