Implementation:Dotnet Machinelearning LdaHybridAliasMap
| Knowledge Sources | |
|---|---|
| Domains | Topic_Modeling, Sampling, Data_Structures |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
hybrid_alias_map builds and stores alias tables for word-topic distributions, using adaptive dense or sparse memory layouts depending on the word's term frequency.
Description
The hybrid_alias_map class (in the lda namespace) wraps an alias table for a single word's topic distribution. Each word in the vocabulary has one hybrid_alias_map instance, stored in the LdaEngine::global_alias_k_v_ vector. The class operates in two modes determined by is_dense_:
Dense mode (is_dense_ = 1):
- The alias table covers all K topics.
- Memory layout: 2 * K int32_t values (K pairs of [alias_index, boundary_value]).
- build_table() computes proportions as (n_kw + beta) / (n_k + beta_sum) for every topic k, then calls AliasMultinomialRNGInt::SetProportionMass() overload 2 to fill the memory.
- next() samples in O(1): compute idx = sample / height_, look up (k, v) at position 2*idx, branchlessly select idx or k based on whether sample < v.
Sparse mode (is_dense_ = 0):
- The alias table only covers topics with nonzero n_kw counts.
- Memory layout: 2 * size_ int32_t values for the alias table, followed by size_ int32_t values for the idx_ indirection array that maps alias indices back to actual topic IDs.
- build_table() iterates only over nonzero word-topic entries (from either a dense or sparse hybrid_map row), computing proportions as n_kw / (n_k + beta_sum) (without the beta numerator -- the beta component is handled separately via the shared beta alias table).
- next() uses a two-component mixture: with probability n_kw_mass_ / (n_kw_mass_ + beta_mass), sample from the sparse alias table and map back through idx_; otherwise, sample from the global beta_k_v_ alias table.
Key fields:
| Field | Type | Description |
|---|---|---|
| memory_ | int32_t* | Pointer into alias_mem_block_ managed by LDAModelBlock |
| is_dense_ | int32_t | 1 = dense (all K topics), 0 = sparse (nonzero only) |
| kv_ | int32_t* | Points to alias (k, v) pairs in memory_ |
| idx_ | int32_t* | Sparse-mode indirection: maps local index to global topic ID |
| height_ | int32_t | Uniform bin height from alias construction |
| capacity_ | int32_t | Max entries this alias map can hold |
| size_ | int32_t | Current number of entries in the alias table |
| n_kw_mass_ | float | Total mass of the sparse n_kw component |
| beta_mass_ | float | Cached beta mass for the mixture sampling |
Usage
Used during each training/inference iteration: LightDocSampler::build_alias_table() calls GenerateAliasTableforWord() which delegates to hybrid_alias_map::build_table() for each word in the thread's range. During sampling, hybrid_alias_map::next() is called by Sample2WordFirst() and Sample2WordFirstInfer() to propose topic assignments.
Code Reference
Source Location
- Repository: Dotnet_Machinelearning
- File: src/Native/LdaNative/hybrid_alias_map.cpp (198 lines)
- File: src/Native/LdaNative/hybrid_alias_map.h (128 lines)
Signature
namespace lda {
class hybrid_alias_map {
public:
hybrid_alias_map();
hybrid_alias_map(int32_t* memory, int32_t is_dense, int32_t capacity);
hybrid_alias_map(const hybrid_alias_map& other);
hybrid_alias_map& operator=(const hybrid_alias_map& other);
void clear();
int32_t size() const;
std::string DebugString();
void build_table(
wood::AliasMultinomialRNGInt& alias_rng,
const hybrid_map& word_topic_row,
const std::vector<int64_t>& summary_row,
std::vector<float>& q_w_proportion,
float beta, float beta_sum,
int word_id, wood::xorshift_rng& rng);
int32_t next(wood::xorshift_rng& rng, int32_t beta_height,
float beta_mass, std::vector<wood::alias_k_v>& beta_k_v,
bool debug);
};
}
Import
// hybrid_alias_map is an internal C++ class; not directly exposed via P/Invoke.
// It is used internally by LdaEngine and LightDocSampler.
I/O Contract
Inputs (build_table)
| Name | Type | Required | Description |
|---|---|---|---|
| alias_rng | AliasMultinomialRNGInt& | Yes | Alias table builder (reused across words) |
| word_topic_row | const hybrid_map& | Yes | The word's row in the word-topic count matrix |
| summary_row | const vector<int64_t>& | Yes | Global topic count summary (K entries) |
| q_w_proportion | vector<float>& | Yes | Scratch buffer (K entries) for computing proportions |
| beta | float | Yes | Dirichlet word-topic smoothing parameter |
| beta_sum | float | Yes | beta * V |
| word_id | int | Yes | Word index (used by sparse overload 3 of SetProportionMass) |
| rng | xorshift_rng& | Yes | Random number generator |
Inputs (next)
| Name | Type | Required | Description |
|---|---|---|---|
| rng | xorshift_rng& | Yes | Random number generator for sampling |
| beta_height | int32_t | Yes | Bin height for the global beta alias table |
| beta_mass | float | Yes | Total mass of the beta smoothing distribution |
| beta_k_v | vector<alias_k_v>& | Yes | Global beta alias table entries |
| debug | bool | No | Debug flag (unused in release) |
Outputs
| Name | Type | Description |
|---|---|---|
| next() return | int32_t | Sampled topic index in [0, K) |
Usage Examples
// Building alias table for a single word (inside LightDocSampler):
void LightDocSampler::GenerateAliasTableforWord(int32_t word) {
alias_k_v_[word].build_table(alias_rng_, word_topic_table_[word],
summary_row_, q_w_proportion_, beta_, beta_sum_, word, rng_);
}
// Sampling a topic proposal for word w:
int32_t t = alias_k_v_[w].next(rng_, beta_height_, beta_mass_, beta_k_v_, false);
Related Pages
- Principle:Dotnet_Machinelearning_Alias_Method_Sampling
- Principle:Dotnet_Machinelearning_Hybrid_Dense_Sparse_Storage
- Implementation:Dotnet_Machinelearning_LdaEngine
- Implementation:Dotnet_Machinelearning_LdaDocumentSampler
- Implementation:Dotnet_Machinelearning_AliasMultinomialRng
- Implementation:Dotnet_Machinelearning_LdaHybridMap
- Environment:Dotnet_Machinelearning_Native_Build_Toolchain