Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Dotnet Machinelearning LdaHybridAliasMap

From Leeroopedia
Revision as of 14:49, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Dotnet_Machinelearning_LdaHybridAliasMap.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Topic_Modeling, Sampling, Data_Structures
Last Updated 2026-02-09 12:00 GMT

Overview

hybrid_alias_map builds and stores alias tables for word-topic distributions, using adaptive dense or sparse memory layouts depending on the word's term frequency.

Description

The hybrid_alias_map class (in the lda namespace) wraps an alias table for a single word's topic distribution. Each word in the vocabulary has one hybrid_alias_map instance, stored in the LdaEngine::global_alias_k_v_ vector. The class operates in two modes determined by is_dense_:

Dense mode (is_dense_ = 1):

  • The alias table covers all K topics.
  • Memory layout: 2 * K int32_t values (K pairs of [alias_index, boundary_value]).
  • build_table() computes proportions as (n_kw + beta) / (n_k + beta_sum) for every topic k, then calls AliasMultinomialRNGInt::SetProportionMass() overload 2 to fill the memory.
  • next() samples in O(1): compute idx = sample / height_, look up (k, v) at position 2*idx, branchlessly select idx or k based on whether sample < v.

Sparse mode (is_dense_ = 0):

  • The alias table only covers topics with nonzero n_kw counts.
  • Memory layout: 2 * size_ int32_t values for the alias table, followed by size_ int32_t values for the idx_ indirection array that maps alias indices back to actual topic IDs.
  • build_table() iterates only over nonzero word-topic entries (from either a dense or sparse hybrid_map row), computing proportions as n_kw / (n_k + beta_sum) (without the beta numerator -- the beta component is handled separately via the shared beta alias table).
  • next() uses a two-component mixture: with probability n_kw_mass_ / (n_kw_mass_ + beta_mass), sample from the sparse alias table and map back through idx_; otherwise, sample from the global beta_k_v_ alias table.

Key fields:

Field Type Description
memory_ int32_t* Pointer into alias_mem_block_ managed by LDAModelBlock
is_dense_ int32_t 1 = dense (all K topics), 0 = sparse (nonzero only)
kv_ int32_t* Points to alias (k, v) pairs in memory_
idx_ int32_t* Sparse-mode indirection: maps local index to global topic ID
height_ int32_t Uniform bin height from alias construction
capacity_ int32_t Max entries this alias map can hold
size_ int32_t Current number of entries in the alias table
n_kw_mass_ float Total mass of the sparse n_kw component
beta_mass_ float Cached beta mass for the mixture sampling

Usage

Used during each training/inference iteration: LightDocSampler::build_alias_table() calls GenerateAliasTableforWord() which delegates to hybrid_alias_map::build_table() for each word in the thread's range. During sampling, hybrid_alias_map::next() is called by Sample2WordFirst() and Sample2WordFirstInfer() to propose topic assignments.

Code Reference

Source Location

Signature

namespace lda {
    class hybrid_alias_map {
    public:
        hybrid_alias_map();
        hybrid_alias_map(int32_t* memory, int32_t is_dense, int32_t capacity);
        hybrid_alias_map(const hybrid_alias_map& other);
        hybrid_alias_map& operator=(const hybrid_alias_map& other);

        void clear();
        int32_t size() const;
        std::string DebugString();

        void build_table(
            wood::AliasMultinomialRNGInt& alias_rng,
            const hybrid_map& word_topic_row,
            const std::vector<int64_t>& summary_row,
            std::vector<float>& q_w_proportion,
            float beta, float beta_sum,
            int word_id, wood::xorshift_rng& rng);

        int32_t next(wood::xorshift_rng& rng, int32_t beta_height,
                     float beta_mass, std::vector<wood::alias_k_v>& beta_k_v,
                     bool debug);
    };
}

Import

// hybrid_alias_map is an internal C++ class; not directly exposed via P/Invoke.
// It is used internally by LdaEngine and LightDocSampler.

I/O Contract

Inputs (build_table)

Name Type Required Description
alias_rng AliasMultinomialRNGInt& Yes Alias table builder (reused across words)
word_topic_row const hybrid_map& Yes The word's row in the word-topic count matrix
summary_row const vector<int64_t>& Yes Global topic count summary (K entries)
q_w_proportion vector<float>& Yes Scratch buffer (K entries) for computing proportions
beta float Yes Dirichlet word-topic smoothing parameter
beta_sum float Yes beta * V
word_id int Yes Word index (used by sparse overload 3 of SetProportionMass)
rng xorshift_rng& Yes Random number generator

Inputs (next)

Name Type Required Description
rng xorshift_rng& Yes Random number generator for sampling
beta_height int32_t Yes Bin height for the global beta alias table
beta_mass float Yes Total mass of the beta smoothing distribution
beta_k_v vector<alias_k_v>& Yes Global beta alias table entries
debug bool No Debug flag (unused in release)

Outputs

Name Type Description
next() return int32_t Sampled topic index in [0, K)

Usage Examples

// Building alias table for a single word (inside LightDocSampler):
void LightDocSampler::GenerateAliasTableforWord(int32_t word) {
    alias_k_v_[word].build_table(alias_rng_, word_topic_table_[word],
        summary_row_, q_w_proportion_, beta_, beta_sum_, word, rng_);
}

// Sampling a topic proposal for word w:
int32_t t = alias_k_v_[w].next(rng_, beta_height_, beta_mass_, beta_k_v_, false);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment