Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Pyro ppl Pyro Topic Modeling

From Leeroopedia


Knowledge Sources
Domains Topic Modeling, Natural Language Processing, Variational Inference
Last Updated 2026-02-09 09:00 GMT

Overview

Amortized Latent Dirichlet Allocation combines the classical LDA generative model for document collections with neural network-based amortized inference, enabling scalable topic discovery without per-document variational optimization.

Description

Latent Dirichlet Allocation (LDA) is the foundational probabilistic topic model. It models a collection of documents as mixtures of latent "topics," where each topic is a distribution over words. The generative process is:

  1. For each topic k, draw a word distribution: beta_k ~ Dirichlet(eta).
  2. For each document d:
    1. Draw a topic mixture: theta_d ~ Dirichlet(alpha).
    2. For each word position n in document d:
      1. Draw a topic assignment: z_{dn} ~ Categorical(theta_d).
      2. Draw a word: w_{dn} ~ Categorical(beta_{z_{dn}}).

The key latent variables are:

  • Topics (beta): Each topic is a probability distribution over the vocabulary, capturing a coherent theme.
  • Topic proportions (theta): Each document has a mixture over topics, representing what the document is "about."
  • Topic assignments (z): Each word is assigned to a specific topic.

Traditional inference for LDA uses mean-field variational inference with per-document optimization, which is computationally expensive. Amortized LDA replaces this with a neural network encoder that maps a document's bag-of-words representation directly to approximate posterior parameters, enabling:

  • Fast inference: A single forward pass through the encoder, instead of iterative optimization per document.
  • Scalability: Mini-batch training with stochastic gradient descent.
  • Flexibility: The encoder can be any differentiable architecture (MLP, transformer).

This is a key example of how deep learning and probabilistic programming complement each other: the generative model provides interpretability (topics are meaningful), while the neural encoder provides scalability.

Usage

Use amortized LDA when:

  • Discovering latent topics in large document collections.
  • You need fast inference for new documents (amortization avoids per-document optimization).
  • Building interpretable text representations where topics have semantic meaning.
  • Combining topic modeling with downstream tasks (classification, retrieval).
  • Working with large vocabularies and document collections that require scalable inference.

Theoretical Basis

LDA generative model:

# Hyperparameters: alpha (topic prior), eta (word prior), K (num topics)
# For k = 1, ..., K:
#     beta_k ~ Dirichlet(eta)           # topic-word distributions
# For d = 1, ..., D:
#     theta_d ~ Dirichlet(alpha)         # document-topic proportions
#     For n = 1, ..., N_d:
#         z_{dn} ~ Categorical(theta_d)  # topic assignment
#         w_{dn} ~ Categorical(beta_{z_{dn}})  # word

Collapsed representation (integrating out z):

# Marginalizing over topic assignments z:
# p(w_d | theta_d, beta) = product_n sum_k theta_{dk} * beta_{k, w_{dn}}

# In bag-of-words form:
# p(w_d | theta_d, beta) = product_v (sum_k theta_{dk} * beta_{kv})^{count(v, d)}

# Log-likelihood per document:
# log p(w_d | theta_d, beta) = sum_v count(v, d) * log(sum_k theta_{dk} * beta_{kv})

Amortized variational inference:

# Standard VI: for each document d, optimize q(theta_d | lambda_d)
# lambda_d = argmax_{lambda} ELBO_d(lambda)  -- per-document optimization

# Amortized VI: learn an encoder network
# lambda_d = encoder(bow_d; phi)  -- single forward pass

# Encoder: maps bag-of-words vector to Dirichlet parameters
# bow_d: V-dimensional count vector
# phi: encoder neural network weights

# ELBO:
# L = sum_d [E_{q(theta_d)}[log p(w_d | theta_d, beta)] - KL(q(theta_d) || p(theta_d))]

# Reparameterization for Dirichlet:
# Use Laplace approximation or logistic-normal approximation:
# theta_d = softmax(mu_d + sigma_d * epsilon), epsilon ~ Normal(0, I)
# This is the logistic-normal approximation to Dirichlet

Training procedure:

# Parameters: beta (topics), phi (encoder weights)
# For each mini-batch of documents:
#   1. Encode: lambda_d = encoder(bow_d; phi) for d in batch
#   2. Sample: theta_d ~ q(theta | lambda_d)  (reparameterized)
#   3. Reconstruct: p(w_d | theta_d, beta)
#   4. ELBO = reconstruction - KL
#   5. Update beta, phi via gradient ascent on ELBO

# After training:
# - beta gives K interpretable topics (word distributions)
# - encoder provides instant topic inference for new documents
# - No iterative optimization needed at test time

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment