Principle:Pyro ppl Pyro Topic Modeling

Knowledge Sources	Latent Dirichlet Allocation Auto-Encoding Variational Bayes Neural Variational Inference for Topic Models
Domains	Topic Modeling, Natural Language Processing, Variational Inference
Last Updated	2026-02-09 09:00 GMT

Overview

Amortized Latent Dirichlet Allocation combines the classical LDA generative model for document collections with neural network-based amortized inference, enabling scalable topic discovery without per-document variational optimization.

Description

Latent Dirichlet Allocation (LDA) is the foundational probabilistic topic model. It models a collection of documents as mixtures of latent "topics," where each topic is a distribution over words. The generative process is:

For each topic k, draw a word distribution: beta_k ~ Dirichlet(eta).
For each document d:
1. Draw a topic mixture: theta_d ~ Dirichlet(alpha).
2. For each word position n in document d:
  1. Draw a topic assignment: z_{dn} ~ Categorical(theta_d).
  2. Draw a word: w_{dn} ~ Categorical(beta_{z_{dn}}).

The key latent variables are:

Topics (beta): Each topic is a probability distribution over the vocabulary, capturing a coherent theme.
Topic proportions (theta): Each document has a mixture over topics, representing what the document is "about."
Topic assignments (z): Each word is assigned to a specific topic.

Traditional inference for LDA uses mean-field variational inference with per-document optimization, which is computationally expensive. Amortized LDA replaces this with a neural network encoder that maps a document's bag-of-words representation directly to approximate posterior parameters, enabling:

Fast inference: A single forward pass through the encoder, instead of iterative optimization per document.
Scalability: Mini-batch training with stochastic gradient descent.
Flexibility: The encoder can be any differentiable architecture (MLP, transformer).

This is a key example of how deep learning and probabilistic programming complement each other: the generative model provides interpretability (topics are meaningful), while the neural encoder provides scalability.

Usage

Use amortized LDA when:

Discovering latent topics in large document collections.
You need fast inference for new documents (amortization avoids per-document optimization).
Building interpretable text representations where topics have semantic meaning.
Combining topic modeling with downstream tasks (classification, retrieval).
Working with large vocabularies and document collections that require scalable inference.

Theoretical Basis

LDA generative model:

# Hyperparameters: alpha (topic prior), eta (word prior), K (num topics)
# For k = 1, ..., K:
#     beta_k ~ Dirichlet(eta)           # topic-word distributions
# For d = 1, ..., D:
#     theta_d ~ Dirichlet(alpha)         # document-topic proportions
#     For n = 1, ..., N_d:
#         z_{dn} ~ Categorical(theta_d)  # topic assignment
#         w_{dn} ~ Categorical(beta_{z_{dn}})  # word

Collapsed representation (integrating out z):

# Marginalizing over topic assignments z:
# p(w_d | theta_d, beta) = product_n sum_k theta_{dk} * beta_{k, w_{dn}}

# In bag-of-words form:
# p(w_d | theta_d, beta) = product_v (sum_k theta_{dk} * beta_{kv})^{count(v, d)}

# Log-likelihood per document:
# log p(w_d | theta_d, beta) = sum_v count(v, d) * log(sum_k theta_{dk} * beta_{kv})

Amortized variational inference:

# Standard VI: for each document d, optimize q(theta_d | lambda_d)
# lambda_d = argmax_{lambda} ELBO_d(lambda)  -- per-document optimization

# Amortized VI: learn an encoder network
# lambda_d = encoder(bow_d; phi)  -- single forward pass

# Encoder: maps bag-of-words vector to Dirichlet parameters
# bow_d: V-dimensional count vector
# phi: encoder neural network weights

# ELBO:
# L = sum_d [E_{q(theta_d)}[log p(w_d | theta_d, beta)] - KL(q(theta_d) || p(theta_d))]

# Reparameterization for Dirichlet:
# Use Laplace approximation or logistic-normal approximation:
# theta_d = softmax(mu_d + sigma_d * epsilon), epsilon ~ Normal(0, I)
# This is the logistic-normal approximation to Dirichlet

Training procedure:

# Parameters: beta (topics), phi (encoder weights)
# For each mini-batch of documents:
#   1. Encode: lambda_d = encoder(bow_d; phi) for d in batch
#   2. Sample: theta_d ~ q(theta | lambda_d)  (reparameterized)
#   3. Reconstruct: p(w_d | theta_d, beta)
#   4. ELBO = reconstruction - KL
#   5. Update beta, phi via gradient ascent on ELBO

# After training:
# - beta gives K interpretable topics (word distributions)
# - encoder provides instant topic inference for new documents
# - No iterative optimization needed at test time

Related Pages

Implementation:Pyro_ppl_Pyro_LDA

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment