Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Dotnet Machinelearning AliasMultinomialRng

From Leeroopedia
Revision as of 14:49, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Dotnet_Machinelearning_AliasMultinomialRng.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Sampling, Topic_Modeling, Random_Number_Generation
Last Updated 2026-02-09 12:00 GMT

Overview

AliasMultinomialRNGInt implements O(1) sampling from arbitrary discrete distributions using the Marsaglia/Vose alias method with integer arithmetic for numerical stability.

Description

The AliasMultinomialRNGInt class (in the wood namespace) provides an efficient mechanism for drawing random samples from a categorical distribution over K outcomes. The alias method preprocesses the distribution in O(K) time, constructing an alias table that enables each subsequent sample in O(1) time. This is critical for LDA performance because the Gibbs sampler must repeatedly draw topic proposals from word-topic and document-topic distributions.

Key design decisions:

  • Integer arithmetic: All proportions are scaled to 32-bit integers (mass_int_ = 0x7fffffff normalized to be divisible by K), avoiding floating-point precision issues. The uniform "height" per bin is a_int_ = mass_int_ / n_.
  • Three SetProportionMass overloads:
    • Overload 1 (vector<alias_k_v>& output): Builds the alias table into a vector of alias_k_v structs (each containing alias index k_ and boundary value v_). Used for the global beta smoothing alias table.
    • Overload 2 (int32_t* memory output): Builds the alias table into a flat memory block (pairs of [k, v] as int32_t). Used for per-word dense alias tables.
    • Overload 3 (int32_t* memory, int32_t size): Same as overload 2 but takes an explicit size parameter for sparse word-topic distributions where not all K topics are present.
  • Mass correction: After integer conversion, the total mass may not exactly equal mass_int_ due to rounding. The code adjusts by incrementing or decrementing individual proportions in round-robin fashion until the totals match.
  • L/H queue construction: Uses two queues (L_ for "light" bins below average, H_ for "heavy" bins above) to pair underfull with overfull categories, the standard Vose algorithm.

Sampling (Next method):

  • Draw a uniform random integer sample from the xorshift_rng.
  • Compute bin index: idx = sample / height_.
  • Look up k_ (alias) and v_ (boundary) for that bin.
  • If sample < v_, return idx; otherwise return k_. This is done branchlessly: m = -(sample < v); return (idx & m) | (k & ~m).

Supporting struct:

struct alias_k_v {
    int32_t k_;  // alias index
    int32_t v_;  // boundary value
};

Usage

Used by LdaEngine to build the global beta smoothing alias table (shared across all words), and by hybrid_alias_map::build_table() to construct per-word alias tables for the word-topic proposal distribution. Also used internally by LightDocSampler for topic proposal generation.

Code Reference

Source Location

Signature

namespace wood {
    struct alias_k_v {
        int32_t k_;
        int32_t v_;
    };

    class AliasMultinomialRNGInt {
    public:
        AliasMultinomialRNGInt();
        ~AliasMultinomialRNGInt();

        void Init(int K);

        // Overload 1: Build alias table into vector<alias_k_v>
        void SetProportionMass(std::vector<float>& proportion, float mass,
            std::vector<alias_k_v>& alias_kv, int32_t* height, xorshift_rng& rng);

        // Overload 2: Build alias table into flat int32_t* memory
        void SetProportionMass(std::vector<float>& proportion, float mass,
            int32_t* memory, int32_t* height, xorshift_rng& rng);

        // Overload 3: Build alias table for subset of size entries
        void SetProportionMass(std::vector<float>& proportion, int32_t size,
            float mass, int32_t* memory, int32_t* height,
            xorshift_rng& rng, int32_t word_id);

        int32_t Next(xorshift_rng& rng, std::vector<alias_k_v>& alias_kv);

    public:
        std::vector<int32_t> proportion_int_;
        int32_t* internal_memory_;
        int32_t n_;
        int32_t a_int_;       // uniform height per bin
        int32_t mass_int_;    // total integer mass

        std::vector<std::pair<int32_t, int32_t>> L_;  // light queue
        std::vector<std::pair<int32_t, int32_t>> H_;  // heavy queue
    };
}

Import

// AliasMultinomialRNGInt is an internal C++ class; not directly exposed via P/Invoke.
// It is used internally by LdaEngine and LightDocSampler for alias table construction.

I/O Contract

Inputs

Name Type Required Description
K int Yes (Init) Number of categories (topics). Determines internal buffer sizes.
proportion vector<float>& Yes Unnormalized probability weights for each category. Modified in-place (divided by mass).
mass float Yes Total sum of the proportion vector before normalization.
alias_kv vector<alias_k_v>& Yes (Overload 1) Output vector for alias table entries.
memory int32_t* Yes (Overloads 2,3) Output flat memory block for alias table (2 * n entries).
height int32_t* Yes Output: uniform bin height a_int_ used for sampling.
rng xorshift_rng& Yes Random number generator (currently unused in construction, passed for API consistency).
size int32_t Yes (Overload 3) Number of active entries (less than or equal to K) for sparse distributions.

Outputs

Name Type Description
alias_kv / memory alias_k_v[] / int32_t* Constructed alias table: for each bin i, stores (alias_index, boundary_value).
height int32_t The uniform height a_int_ = mass_int_ / n_ needed for the sampling step.
Next() return int32_t Sampled category index in [0, K).

Algorithm Detail

The Marsaglia/Vose alias method works by decomposing an arbitrary discrete distribution over K outcomes into K equal-width bins, each containing at most two outcomes:

  1. Scale: Convert float proportions to integers, total = mass_int_ = a_int_ * K.
  2. Classify: Partition categories into "light" (weight < a_int_) and "heavy" (weight >= a_int_).
  3. Pair:' For each light category i with weight p_i: assign the remaining space (a_int_ - p_i) to a heavy category j. Set alias_kv[i] = {k=j, v=i*a_int_+p_i}. Reduce js weight by (a_int_ - p_i) and re-classify.
  4. Sample: Generate uniform random u in [0, mass_int_). Compute bin = u / a_int_. If u < alias_kv[bin].v_, return bin; else return alias_kv[bin].k_.

Usage Examples

// Building the global beta alias table in LdaEngine:
std::vector<float> proportion(K_);
beta_mass_ = 0;
for (int k = 0; k < K_; ++k) {
    proportion[k] = beta_ / (global_summary_row_[k] + beta_sum_);
    beta_mass_ += proportion[k];
}
alias_rng_int_.SetProportionMass(proportion, beta_mass_,
    beta_k_v_, &beta_height_, sampler_.rng());

// Sampling from the alias table:
int32_t topic = alias_rng_int_.Next(rng, beta_k_v_);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment