Implementation:Neuml Txtai SQLite ANN
| Knowledge Sources | |
|---|---|
| Domains | Vector_Search, Database |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
SQLite is a lightweight ANN index backed by a SQLite database using the sqlite-vec extension for embedded vector similarity search with cosine distance.
Description
The SQLite class inherits from ANN and provides a file-based vector index using SQLite with the sqlite-vec virtual table extension. It stores embeddings in a virtual table created with the vec0 module and supports three storage modes: standard 32-bit float (FLOAT), 8-bit integer quantization (INT8), and binary quantization (BIT). Similarity search is performed using cosine distance via the MATCH operator with results scored as 1 - distance.
The class manages database connections, table creation, and data persistence. Since SQLite is file-based, the save method handles copying from in-memory or temporary databases to persistent files using either the SQLite backup API or the slower iterdump method when uncommitted transactions exist.
Usage
Use the SQLite backend when you need a zero-dependency, file-based ANN index that does not require a separate database server. It is ideal for single-process applications, prototyping, edge deployments, and scenarios where simplicity and portability are more important than concurrent access. The INT8 and BIT quantization modes can reduce file size for large collections.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File: src/python/txtai/ann/dense/sqlite.py
- Lines: 1-303
Signature
class SQLite(ANN):
"""
Builds an ANN index backed by a SQLite database.
"""
def __init__(self, config):
super().__init__(config)
if not SQLITEVEC:
raise ImportError('sqlite-vec is not available - install "ann" extra to enable')
# Database parameters
self.connection, self.cursor, self.path = None, None, ""
# Quantization setting
self.quantize = self.setting("quantize")
self.quantize = 8 if isinstance(self.quantize, bool) else int(self.quantize) if self.quantize else None
Import
from txtai.ann.dense import SQLite
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | dict | Yes | Index configuration dictionary containing backend settings, dimensions, and optional sqlite-specific keys such as table (str) and quantize (bool/int: 1 for binary, 8 for INT8, None for FLOAT32)
|
Outputs
| Name | Type | Description |
|---|---|---|
| self.connection | sqlite3.Connection | Active SQLite database connection with sqlite-vec extension loaded |
| self.cursor | sqlite3.Cursor | Database cursor for executing queries |
| self.config | dict | Updated configuration with offset and build metadata including SQLite and sqlite-vec versions
|
Key Methods
load(self, path)
Stores the database path for lazy connection. The actual connection is deferred until the first database operation via the database() method.
index(self, embeddings)
Initializes the virtual table (recreating if necessary), inserts all embeddings with sequential index ids, and records build metadata including SQLite and sqlite-vec version numbers.
append(self, embeddings)
Inserts new embeddings starting from the current offset using executemany for batch insertion. Updates the offset in config.
delete(self, ids)
Deletes rows from the virtual table by index id using batched executemany.
search(self, queries, limit)
Executes a SELECT indexid, 1 - distance query per input query using the MATCH operator against the embedding column with k = limit. Returns [(id, score)] per query.
count(self)
Returns the count of rows in the vectors virtual table.
save(self, path)
Handles three cases: (1) temporary database is copied to the target path using the SQLite backup API, (2) same path triggers a commit, (3) different path copies data to the new location while keeping the current connection.
close(self)
Closes the database connection and sets it to None.
initialize(self, recreate=False)
Creates the vec0 virtual table with the appropriate embedding type (FLOAT, INT8, or BIT) based on quantization settings. Optionally clears existing data when recreating.
connect(self, path)
Creates a new SQLite connection, loads the sqlite-vec extension, and returns the connection. Extension loading is bracketed by enable_load_extension calls for security.
Usage Examples
Basic Usage
import numpy as np
from txtai.ann.dense import SQLite
# Configuration for SQLite backend
config = {
"backend": "sqlite",
"dimensions": 384,
"sqlite": {
"table": "vectors",
"quantize": 8 # INT8 quantization
}
}
# Create and build the index
ann = SQLite(config)
embeddings = np.random.rand(500, 384).astype(np.float32)
ann.index(embeddings)
# Search for similar vectors
queries = np.random.rand(1, 384).astype(np.float32)
results = ann.search(queries, limit=10)
# results: [[(id, score), ...]]
# Save to file
ann.save("/tmp/vectors.db")
# Reload from file
ann2 = SQLite(config)
ann2.load("/tmp/vectors.db")
# Append more data
new_data = np.random.rand(100, 384).astype(np.float32)
ann2.append(new_data)
print(ann2.count()) # 600
ann2.close()