Implementation:Neuml Txtai Topic Modeling
| Knowledge Sources | |
|---|---|
| Domains | Topic Modeling, Community Detection, Text Mining |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete tool for topic modeling via community detection on graph networks provided by txtai.
Description
The Topics class implements topic modeling by leveraging community detection algorithms on graph structures. It detects communities within a graph, then uses BM25 (or a configurable scoring method) to identify the most representative terms for each community, producing human-readable topic labels. The process involves: (1) running community detection on the graph, (2) sorting communities by size (largest first), (3) computing graph centrality for node ranking, (4) tokenizing node text with stopword filtering, (5) building a scoring index per community to extract top-N terms based on inverse document frequency, and (6) merging duplicate topics that share the same term sets. Nodes within each community are ranked by their BM25 relevance score to the topic terms. For communities with no text content, a generic "topic_N" label is generated and nodes are ranked by centrality.
Usage
Use Topics when you need to automatically discover and label topics within a txtai graph. It is invoked internally by the base Graph class's addtopics method during graph indexing. Direct usage is appropriate when you need custom topic extraction from a graph instance outside of the standard indexing pipeline.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File:
src/python/txtai/graph/topics.py
Signature
class Topics:
def __init__(self, config)
def __call__(self, graph)
def score(self, graph, index, community, centrality)
def tokenize(self, graph, node)
def topn(self, terms, n)
def merge(self, topics)
Import
from txtai.graph.topics import Topics
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | dict | Yes | Topic configuration dictionary. Supports keys: algorithm (community detection algorithm, e.g., "louvain", "greedy", "lpa"), labels (scoring method, default "bm25"), terms (number of top terms per topic, default 4), stopwords (list of additional stopwords to exclude from topic labels), categories (list of category labels for higher-level classification), resolution (community detection resolution parameter).
|
| graph | Graph | Yes (for __call__) | A graph instance that implements communities(), centrality(), and attribute() methods.
|
| index | int | Yes (for score) | Community index number, used for generating fallback topic names. |
| community | set/list | Yes (for score) | Set of node ids belonging to a single community. |
| centrality | dict | Yes (for score) | Dictionary of {node_id: centrality_score} for the full graph. |
Outputs
| Name | Type | Description |
|---|---|---|
| __call__() | dict | Dictionary of {topic_name: [node_ids]} where topic names are underscore-joined top terms (e.g., "machine_learning_deep_neural") and node ids are sorted by relevance score descending. Duplicate topics with identical term sets are merged. |
| score() | tuple | A 2-tuple of (top_terms_list, sorted_node_ids) for a single community. |
| tokenize() | list | List of string tokens extracted from a node's text attribute with stopword filtering applied. |
| topn() | list | List of up to N terms that pass tokenization and stopword rules. |
| merge() | dict | Merged dictionary of {topic_name: [node_ids]} sorted by community size descending. |
Usage Examples
from txtai.graph.topics import Topics
from txtai.graph.networkx import NetworkX
# Build a graph
config = {"topics": {"algorithm": "louvain", "terms": 4}}
graph = NetworkX(config)
graph.initialize()
# Add nodes with text
graph.addnode(0, id="doc1", text="Machine learning algorithms")
graph.addnode(1, id="doc2", text="Deep learning neural networks")
graph.addnode(2, id="doc3", text="Natural language processing")
graph.addedge(0, 1, weight=0.9)
graph.addedge(1, 2, weight=0.7)
# Run topic modeling
topics = Topics({
"algorithm": "louvain",
"terms": 4,
"stopwords": ["the", "and"]
})
result = topics(graph)
# result: {"machine_learning_deep_neural": [0, 1, 2]}