Implementation:Neuml Txtai RDBMS Database
| Knowledge Sources | |
|---|---|
| Domains | Database, Storage |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
RDBMS is the base relational database class that provides SQL-driven document storage, retrieval, and similarity search integration for txtai embeddings indexes.
Description
The RDBMS class inherits from Database and implements the core relational database logic for storing documents, objects, and text sections alongside vector indexes. It uses SQL to insert, update, delete, and query data across three primary tables: documents (JSON metadata), objects (binary-encoded data), and sections (indexed text with embeddings references). The class manages temporary batch and score tables for efficient similarity query processing, supports custom SQL functions, and handles reindexing operations that renumber sequential ids after deletions.
RDBMS is an abstract class with several methods that must be implemented by concrete subclasses (such as SQLite or PostgreSQL backends): connect, getcursor, jsonprefix, jsoncolumn, rows, and addfunctions. It provides the full query pipeline from SQL parsing to result mapping, including support for the similar() function that embeds vector similarity results into SQL WHERE clauses.
Usage
Use RDBMS (through a concrete subclass) when you need content storage alongside your vector index. It powers the content=True mode in txtai embeddings, enabling SQL queries that combine text search, metadata filtering, and vector similarity in a single query. It is the foundation for all relational database backends in txtai.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File: src/python/txtai/database/rdbms.py
- Lines: 1-569
Signature
class RDBMS(Database):
"""
Base relational database class. A relational database uses SQL to insert, update, delete and select from a
database instance.
"""
def __init__(self, config):
"""
Creates a new Database.
Args:
config: database configuration parameters
"""
super().__init__(config)
# Database connection
self.connection = None
self.cursor = None
Import
from txtai.database import RDBMS
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | dict | Yes | Database configuration dictionary containing content backend name, optional columns mapping (text, object), objects encoder config, functions list, and expressions list
|
Outputs
| Name | Type | Description |
|---|---|---|
| self.connection | object | Database connection (type depends on subclass implementation) |
| self.cursor | object | Database cursor for executing SQL statements |
| self.config | dict | Configuration dictionary with database settings |
Key Methods
load(self, path)
Opens a database session at the given path, loading an existing database for continued use.
insert(self, documents, index=0)
Inserts a batch of documents into the database. Each document tuple (uid, document, tags) is processed: dict documents have their JSON stored in the documents table, text sections are stored in the sections table, and objects (when an encoder is configured) are binary-encoded and stored in the objects table. The index parameter serves as the starting indexid.
delete(self, ids)
Deletes all records matching the given ids from the documents, objects, and sections tables using temporary batch tables for efficient IN-clause processing.
reindex(self, config)
Streams all existing sections into a new table with renumbered sequential indexids, yielding (uid, data, tags) tuples for re-embedding. Swaps the rebuilt table in place of the original sections table.
save(self, path)
Commits the current database transaction.
close(self)
Closes the database connection.
ids(self, ids)
Retrieves internal indexid mappings for a list of document ids using batch table lookups.
count(self)
Returns the total number of indexed sections.
resolve(self, name, alias=None)
Maps query column names to database column names. Standard columns (indexid, id, tags, entry) are prefixed with s., special columns (data, object, score, text) are used as-is, and all other names are resolved as JSON column expressions from the documents table. Optionally builds alias clauses.
embed(self, similarity, batch)
Loads a batch of similarity results (indexid, score pairs) into temporary tables for use as SQL subquery filters. Returns the SQL clause placeholder.
query(self, query, limit, parameters, indexids)
Builds and executes a full SQL query from parsed components (SELECT, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, OFFSET). Maps result rows to dictionaries keyed by column names. When indexids is True, returns [(indexid, score)] tuples instead.
initialize(self)
Creates the database connection and initial table schema (documents, objects, sections) if no connection exists.
Usage Examples
Basic Usage
from txtai.database.rdbms import RDBMS
# RDBMS is abstract - use via a concrete subclass
# This example illustrates the interface contract
# Typical usage through txtai Embeddings with content storage:
from txtai import Embeddings
embeddings = Embeddings({"content": True, "path": "sentence-transformers/all-MiniLM-L6-v2"})
# Insert documents (internally uses RDBMS.insert)
documents = [
("doc1", {"text": "Machine learning algorithms", "category": "AI"}, None),
("doc2", {"text": "Natural language processing", "category": "NLP"}, None),
("doc3", {"text": "Computer vision models", "category": "CV"}, None),
]
embeddings.index([(uid, doc, tags) for uid, doc, tags in documents])
# SQL query combining similarity and metadata filtering
results = embeddings.search(
"SELECT id, text, score FROM txtai WHERE similar('deep learning') AND category = 'AI' LIMIT 5"
)
# Count records
count = embeddings.count()