Implementation:Neuml Txtai RDBMS Database

Knowledge Sources	Neuml_Txtai
Domains	Database, Storage
Last Updated	2026-02-09 17:00 GMT

Overview

RDBMS is the base relational database class that provides SQL-driven document storage, retrieval, and similarity search integration for txtai embeddings indexes.

Description

The RDBMS class inherits from Database and implements the core relational database logic for storing documents, objects, and text sections alongside vector indexes. It uses SQL to insert, update, delete, and query data across three primary tables: documents (JSON metadata), objects (binary-encoded data), and sections (indexed text with embeddings references). The class manages temporary batch and score tables for efficient similarity query processing, supports custom SQL functions, and handles reindexing operations that renumber sequential ids after deletions.

RDBMS is an abstract class with several methods that must be implemented by concrete subclasses (such as SQLite or PostgreSQL backends): connect, getcursor, jsonprefix, jsoncolumn, rows, and addfunctions. It provides the full query pipeline from SQL parsing to result mapping, including support for the similar() function that embeds vector similarity results into SQL WHERE clauses.

Usage

Use RDBMS (through a concrete subclass) when you need content storage alongside your vector index. It powers the content=True mode in txtai embeddings, enabling SQL queries that combine text search, metadata filtering, and vector similarity in a single query. It is the foundation for all relational database backends in txtai.

Code Reference

Source Location

Repository: Neuml_Txtai
File: src/python/txtai/database/rdbms.py
Lines: 1-569

Signature

class RDBMS(Database):
    """
    Base relational database class. A relational database uses SQL to insert, update, delete and select from a
    database instance.
    """

    def __init__(self, config):
        """
        Creates a new Database.

        Args:
            config: database configuration parameters
        """

        super().__init__(config)

        # Database connection
        self.connection = None
        self.cursor = None

Import

from txtai.database import RDBMS

I/O Contract

Inputs

Name	Type	Required	Description
config	dict	Yes	Database configuration dictionary containing `content` backend name, optional `columns` mapping (text, object), `objects` encoder config, `functions` list, and `expressions` list

Outputs

Name	Type	Description
self.connection	object	Database connection (type depends on subclass implementation)
self.cursor	object	Database cursor for executing SQL statements
self.config	dict	Configuration dictionary with database settings

Key Methods

load(self, path)

Opens a database session at the given path, loading an existing database for continued use.

insert(self, documents, index=0)

Inserts a batch of documents into the database. Each document tuple (uid, document, tags) is processed: dict documents have their JSON stored in the documents table, text sections are stored in the sections table, and objects (when an encoder is configured) are binary-encoded and stored in the objects table. The index parameter serves as the starting indexid.

delete(self, ids)

Deletes all records matching the given ids from the documents, objects, and sections tables using temporary batch tables for efficient IN-clause processing.

reindex(self, config)

Streams all existing sections into a new table with renumbered sequential indexids, yielding (uid, data, tags) tuples for re-embedding. Swaps the rebuilt table in place of the original sections table.

save(self, path)

Commits the current database transaction.

close(self)

Closes the database connection.

ids(self, ids)

Retrieves internal indexid mappings for a list of document ids using batch table lookups.

count(self)

Returns the total number of indexed sections.

resolve(self, name, alias=None)

Maps query column names to database column names. Standard columns (indexid, id, tags, entry) are prefixed with s., special columns (data, object, score, text) are used as-is, and all other names are resolved as JSON column expressions from the documents table. Optionally builds alias clauses.

embed(self, similarity, batch)

Loads a batch of similarity results (indexid, score pairs) into temporary tables for use as SQL subquery filters. Returns the SQL clause placeholder.

query(self, query, limit, parameters, indexids)

Builds and executes a full SQL query from parsed components (SELECT, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, OFFSET). Maps result rows to dictionaries keyed by column names. When indexids is True, returns [(indexid, score)] tuples instead.

initialize(self)

Creates the database connection and initial table schema (documents, objects, sections) if no connection exists.

Usage Examples

Basic Usage

from txtai.database.rdbms import RDBMS

# RDBMS is abstract - use via a concrete subclass
# This example illustrates the interface contract

# Typical usage through txtai Embeddings with content storage:
from txtai import Embeddings

embeddings = Embeddings({"content": True, "path": "sentence-transformers/all-MiniLM-L6-v2"})

# Insert documents (internally uses RDBMS.insert)
documents = [
    ("doc1", {"text": "Machine learning algorithms", "category": "AI"}, None),
    ("doc2", {"text": "Natural language processing", "category": "NLP"}, None),
    ("doc3", {"text": "Computer vision models", "category": "CV"}, None),
]
embeddings.index([(uid, doc, tags) for uid, doc, tags in documents])

# SQL query combining similarity and metadata filtering
results = embeddings.search(
    "SELECT id, text, score FROM txtai WHERE similar('deep learning') AND category = 'AI' LIMIT 5"
)

# Count records
count = embeddings.count()

Related Pages

Principle:Neuml_Txtai_Content_Storage

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment