Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai Topic Modeling

From Leeroopedia


Knowledge Sources
Domains Topic Modeling, Community Detection, Text Mining
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for topic modeling via community detection on graph networks provided by txtai.

Description

The Topics class implements topic modeling by leveraging community detection algorithms on graph structures. It detects communities within a graph, then uses BM25 (or a configurable scoring method) to identify the most representative terms for each community, producing human-readable topic labels. The process involves: (1) running community detection on the graph, (2) sorting communities by size (largest first), (3) computing graph centrality for node ranking, (4) tokenizing node text with stopword filtering, (5) building a scoring index per community to extract top-N terms based on inverse document frequency, and (6) merging duplicate topics that share the same term sets. Nodes within each community are ranked by their BM25 relevance score to the topic terms. For communities with no text content, a generic "topic_N" label is generated and nodes are ranked by centrality.

Usage

Use Topics when you need to automatically discover and label topics within a txtai graph. It is invoked internally by the base Graph class's addtopics method during graph indexing. Direct usage is appropriate when you need custom topic extraction from a graph instance outside of the standard indexing pipeline.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/graph/topics.py

Signature

class Topics:
    def __init__(self, config)
    def __call__(self, graph)
    def score(self, graph, index, community, centrality)
    def tokenize(self, graph, node)
    def topn(self, terms, n)
    def merge(self, topics)

Import

from txtai.graph.topics import Topics

I/O Contract

Inputs

Name Type Required Description
config dict Yes Topic configuration dictionary. Supports keys: algorithm (community detection algorithm, e.g., "louvain", "greedy", "lpa"), labels (scoring method, default "bm25"), terms (number of top terms per topic, default 4), stopwords (list of additional stopwords to exclude from topic labels), categories (list of category labels for higher-level classification), resolution (community detection resolution parameter).
graph Graph Yes (for __call__) A graph instance that implements communities(), centrality(), and attribute() methods.
index int Yes (for score) Community index number, used for generating fallback topic names.
community set/list Yes (for score) Set of node ids belonging to a single community.
centrality dict Yes (for score) Dictionary of {node_id: centrality_score} for the full graph.

Outputs

Name Type Description
__call__() dict Dictionary of {topic_name: [node_ids]} where topic names are underscore-joined top terms (e.g., "machine_learning_deep_neural") and node ids are sorted by relevance score descending. Duplicate topics with identical term sets are merged.
score() tuple A 2-tuple of (top_terms_list, sorted_node_ids) for a single community.
tokenize() list List of string tokens extracted from a node's text attribute with stopword filtering applied.
topn() list List of up to N terms that pass tokenization and stopword rules.
merge() dict Merged dictionary of {topic_name: [node_ids]} sorted by community size descending.

Usage Examples

from txtai.graph.topics import Topics
from txtai.graph.networkx import NetworkX

# Build a graph
config = {"topics": {"algorithm": "louvain", "terms": 4}}
graph = NetworkX(config)
graph.initialize()

# Add nodes with text
graph.addnode(0, id="doc1", text="Machine learning algorithms")
graph.addnode(1, id="doc2", text="Deep learning neural networks")
graph.addnode(2, id="doc3", text="Natural language processing")
graph.addedge(0, 1, weight=0.9)
graph.addedge(1, 2, weight=0.7)

# Run topic modeling
topics = Topics({
    "algorithm": "louvain",
    "terms": 4,
    "stopwords": ["the", "and"]
})

result = topics(graph)
# result: {"machine_learning_deep_neural": [0, 1, 2]}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment