Implementation:Neuml Txtai Topic Modeling

Knowledge Sources	Neuml_Txtai
Domains	Topic Modeling, Community Detection, Text Mining
Last Updated	2026-02-10 01:00 GMT

Overview

Concrete tool for topic modeling via community detection on graph networks provided by txtai.

Description

The Topics class implements topic modeling by leveraging community detection algorithms on graph structures. It detects communities within a graph, then uses BM25 (or a configurable scoring method) to identify the most representative terms for each community, producing human-readable topic labels. The process involves: (1) running community detection on the graph, (2) sorting communities by size (largest first), (3) computing graph centrality for node ranking, (4) tokenizing node text with stopword filtering, (5) building a scoring index per community to extract top-N terms based on inverse document frequency, and (6) merging duplicate topics that share the same term sets. Nodes within each community are ranked by their BM25 relevance score to the topic terms. For communities with no text content, a generic "topic_N" label is generated and nodes are ranked by centrality.

Usage

Use Topics when you need to automatically discover and label topics within a txtai graph. It is invoked internally by the base Graph class's addtopics method during graph indexing. Direct usage is appropriate when you need custom topic extraction from a graph instance outside of the standard indexing pipeline.

Code Reference

Source Location

Repository: Neuml_Txtai
File: src/python/txtai/graph/topics.py

Signature

class Topics:
    def __init__(self, config)
    def __call__(self, graph)
    def score(self, graph, index, community, centrality)
    def tokenize(self, graph, node)
    def topn(self, terms, n)
    def merge(self, topics)

Import

from txtai.graph.topics import Topics

I/O Contract

Inputs

Name	Type	Required	Description
config	dict	Yes	Topic configuration dictionary. Supports keys: `algorithm` (community detection algorithm, e.g., "louvain", "greedy", "lpa"), `labels` (scoring method, default "bm25"), `terms` (number of top terms per topic, default 4), `stopwords` (list of additional stopwords to exclude from topic labels), `categories` (list of category labels for higher-level classification), `resolution` (community detection resolution parameter).
graph	Graph	Yes (for __call__)	A graph instance that implements `communities()`, `centrality()`, and `attribute()` methods.
index	int	Yes (for score)	Community index number, used for generating fallback topic names.
community	set/list	Yes (for score)	Set of node ids belonging to a single community.
centrality	dict	Yes (for score)	Dictionary of {node_id: centrality_score} for the full graph.

Outputs

Name	Type	Description
__call__()	dict	Dictionary of {topic_name: [node_ids]} where topic names are underscore-joined top terms (e.g., "machine_learning_deep_neural") and node ids are sorted by relevance score descending. Duplicate topics with identical term sets are merged.
score()	tuple	A 2-tuple of (top_terms_list, sorted_node_ids) for a single community.
tokenize()	list	List of string tokens extracted from a node's text attribute with stopword filtering applied.
topn()	list	List of up to N terms that pass tokenization and stopword rules.
merge()	dict	Merged dictionary of {topic_name: [node_ids]} sorted by community size descending.

Usage Examples

from txtai.graph.topics import Topics
from txtai.graph.networkx import NetworkX

# Build a graph
config = {"topics": {"algorithm": "louvain", "terms": 4}}
graph = NetworkX(config)
graph.initialize()

# Add nodes with text
graph.addnode(0, id="doc1", text="Machine learning algorithms")
graph.addnode(1, id="doc2", text="Deep learning neural networks")
graph.addnode(2, id="doc3", text="Natural language processing")
graph.addedge(0, 1, weight=0.9)
graph.addedge(1, 2, weight=0.7)

# Run topic modeling
topics = Topics({
    "algorithm": "louvain",
    "terms": 4,
    "stopwords": ["the", "and"]
})

result = topics(graph)
# result: {"machine_learning_deep_neural": [0, 1, 2]}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment