Implementation:Neuml Txtai Graph Base
| Knowledge Sources | |
|---|---|
| Domains | Graph_Networks, Knowledge_Graph |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Graph is the abstract base class for graph network backends in txtai, providing node/edge management, relationship inference, topic modeling via community detection, and subgraph filtering.
Description
The Graph class defines the interface and shared logic for all graph backends in txtai. It manages a graph network where nodes represent indexed documents and edges represent relationships inferred from vector similarity scores or manually provided relationship data. The class supports topic modeling through community detection algorithms (delegated to the Topics helper), with optional category labeling via a similarity function. Concrete subclasses (e.g., NetworkX, igraph) must implement the abstract methods for node/edge operations, graph algorithms (centrality, pagerank, shortest path), graph queries, community detection, and persistence.
The class handles the full graph lifecycle: inserting document nodes with text/object data and optional custom attributes, building edges via batch similarity search, upserting new nodes into an existing graph, filtering subgraphs, and managing topic/category assignments on nodes.
Usage
Use Graph (through a concrete subclass) when you need to build and query a knowledge graph from your txtai embeddings index. It enables relationship discovery between documents, topic modeling, graph-based search, and visualization. Configure the topics key in the graph config to enable community detection with optional category labeling.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File: src/python/txtai/graph/base.py
- Lines: 1-769
Signature
class Graph:
"""
Base class for Graph instances. This class builds graph networks. Supports topic modeling
and relationship traversal.
"""
def __init__(self, config):
"""
Creates a new Graph.
Args:
config: graph configuration
"""
# Graph configuration
self.config = config if config is not None else {}
# Graph backend
self.backend = None
# Topic modeling
self.categories = None
self.topics = None
# Transform columns
columns = config.get("columns", {})
self.text = columns.get("text", "text")
self.object = columns.get("object", "object")
# Attributes to copy
self.copyattributes = config.get("copyattributes", False)
# Relationships are manually-provided edges
self.relationships = columns.get("relationships", "relationships")
self.relations = {}
Import
from txtai.graph import Graph
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | dict | Yes | Graph configuration dictionary containing optional keys: columns (text, object, relationships mappings), copyattributes (bool or list), topics (topic modeling config with optional categories), batchsize (int, default 256), limit (int, default 15), minscore (float, default 0.1), approximate (bool, default True)
|
Outputs
| Name | Type | Description |
|---|---|---|
| self.backend | object | Graph backend instance (type depends on concrete subclass, e.g., NetworkX Graph) |
| self.topics | dict or None | Mapping of topic name to list of node ids belonging to that topic |
| self.categories | list or None | List of category labels corresponding to each topic (same order as topics) |
| self.relations | dict | Temporary storage for manually-provided relationships before they are resolved to edges |
Key Methods
insert(self, documents, index=0)
Inserts graph nodes for a batch of documents. Each document (uid, data, tags) produces a node with id and data attributes. For dict documents, the text/object field is extracted, custom attributes are copied based on copyattributes, and relationship data is stored for later resolution. The index parameter is the starting node id.
delete(self, ids)
Removes nodes and their edges from the graph. Also removes deleted nodes from topic lists and cleans up empty topics.
index(self, search, ids, similarity)
Builds the full graph network. Resolves manually-provided relationship edges, infers edges for all nodes using the batch search function, and optionally runs topic modeling with community detection and category labeling.
upsert(self, search, ids, similarity=None)
Incrementally updates the graph for new/modified nodes. Resolves relationships, infers edges only for nodes with the data attribute (new nodes), and either infers topics from neighboring nodes or rebuilds topics entirely.
filter(self, nodes, graph=None)
Creates a subgraph containing only the specified nodes and their interconnecting edges. Copies node attributes, adds optional score attributes, and filters topics/categories to match the selected nodes. Returns a new graph instance.
addrelations(self, node, relations)
Stores manually-provided relationships for a node. Each relation can be a string id or a dict with id and optional attributes like weight.
inferedges(self, nodes, search, attributes=None)
Iterates through nodes in configurable batch sizes, runs the search function on node data to find similar nodes, and adds edges where similarity exceeds minscore. Nodes with existing edges are skipped in approximate mode.
addtopics(self, similarity=None)
Runs community detection via the Topics helper class, optionally labels each community with a category using the similarity function, and adds topic, topicrank, and category attributes to each node.
cleartopics(self)
Removes all topic-related attributes (topic, topicrank, category) from every node and resets the topics and categories to None.
infertopics(self)
Assigns topics to new nodes (marked with the updated attribute) by analyzing their neighbors' topics and categories using majority voting via Counter.most_common().
Abstract Methods (Must Be Implemented by Subclasses)
| Method | Description |
|---|---|
| create() | Creates the graph network backend |
| count() | Returns total number of nodes |
| scan(attribute, data) | Iterates over nodes matching optional criteria |
| node(node) | Gets node attributes by id |
| addnode(node, **attrs) | Adds a single node |
| addnodes(nodes) | Adds multiple nodes |
| removenode(node) | Removes a node and its edges |
| hasnode(node) | Checks if a node exists |
| attribute(node, field) | Gets a node attribute value |
| addattribute(node, field, value) | Sets a node attribute |
| removeattribute(node, field) | Removes a node attribute |
| edgecount() | Returns total number of edges |
| edges(node) | Gets edges for a node |
| addedge(source, target, **attrs) | Adds a single edge |
| addedges(edges) | Adds multiple edges |
| hasedge(source, target) | Checks if an edge exists |
| centrality() | Runs centrality algorithm |
| pagerank() | Runs PageRank algorithm |
| showpath(source, target) | Finds shortest path |
| isquery(queries) | Validates graph queries |
| parse(query) | Parses a graph query |
| search(query, limit, graph) | Executes a graph search |
| communities(config) | Runs community detection |
| load(path) | Loads graph from file |
| save(path) | Saves graph to file |
| loaddict(data) | Loads graph from dictionary |
| savedict() | Saves graph to dictionary |
Usage Examples
Basic Usage
from txtai import Embeddings
# Create embeddings with graph support
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"content": True,
"graph": {
"limit": 15,
"minscore": 0.2,
"batchsize": 256,
"approximate": True,
"topics": {
"categories": ["science", "technology", "health", "business"]
}
}
})
# Index documents - graph nodes and edges are built automatically
documents = [
("doc1", {"text": "Deep learning for image recognition"}, None),
("doc2", {"text": "Neural networks in computer vision"}, None),
("doc3", {"text": "Transformers for NLP tasks"}, None),
("doc4", {"text": "Stock market prediction models"}, None),
]
embeddings.index([(uid, doc, tags) for uid, doc, tags in documents])
# Access the graph
graph = embeddings.graph
# Get node count
print(graph.count())
# Search the graph
results = graph.search("deep learning", limit=5)
# Get topics
if graph.topics:
for topic, node_ids in graph.topics.items():
print(f"Topic: {topic}, Nodes: {len(node_ids)}")
# Filter to a subgraph
subgraph = graph.filter([0, 1, 2])
print(subgraph.count()) # 3