Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai SQLite ANN

From Leeroopedia


Knowledge Sources
Domains Vector_Search, Database
Last Updated 2026-02-09 17:00 GMT

Overview

SQLite is a lightweight ANN index backed by a SQLite database using the sqlite-vec extension for embedded vector similarity search with cosine distance.

Description

The SQLite class inherits from ANN and provides a file-based vector index using SQLite with the sqlite-vec virtual table extension. It stores embeddings in a virtual table created with the vec0 module and supports three storage modes: standard 32-bit float (FLOAT), 8-bit integer quantization (INT8), and binary quantization (BIT). Similarity search is performed using cosine distance via the MATCH operator with results scored as 1 - distance.

The class manages database connections, table creation, and data persistence. Since SQLite is file-based, the save method handles copying from in-memory or temporary databases to persistent files using either the SQLite backup API or the slower iterdump method when uncommitted transactions exist.

Usage

Use the SQLite backend when you need a zero-dependency, file-based ANN index that does not require a separate database server. It is ideal for single-process applications, prototyping, edge deployments, and scenarios where simplicity and portability are more important than concurrent access. The INT8 and BIT quantization modes can reduce file size for large collections.

Code Reference

Source Location

Signature

class SQLite(ANN):
    """
    Builds an ANN index backed by a SQLite database.
    """

    def __init__(self, config):
        super().__init__(config)

        if not SQLITEVEC:
            raise ImportError('sqlite-vec is not available - install "ann" extra to enable')

        # Database parameters
        self.connection, self.cursor, self.path = None, None, ""

        # Quantization setting
        self.quantize = self.setting("quantize")
        self.quantize = 8 if isinstance(self.quantize, bool) else int(self.quantize) if self.quantize else None

Import

from txtai.ann.dense import SQLite

I/O Contract

Inputs

Name Type Required Description
config dict Yes Index configuration dictionary containing backend settings, dimensions, and optional sqlite-specific keys such as table (str) and quantize (bool/int: 1 for binary, 8 for INT8, None for FLOAT32)

Outputs

Name Type Description
self.connection sqlite3.Connection Active SQLite database connection with sqlite-vec extension loaded
self.cursor sqlite3.Cursor Database cursor for executing queries
self.config dict Updated configuration with offset and build metadata including SQLite and sqlite-vec versions

Key Methods

load(self, path)

Stores the database path for lazy connection. The actual connection is deferred until the first database operation via the database() method.

index(self, embeddings)

Initializes the virtual table (recreating if necessary), inserts all embeddings with sequential index ids, and records build metadata including SQLite and sqlite-vec version numbers.

append(self, embeddings)

Inserts new embeddings starting from the current offset using executemany for batch insertion. Updates the offset in config.

delete(self, ids)

Deletes rows from the virtual table by index id using batched executemany.

search(self, queries, limit)

Executes a SELECT indexid, 1 - distance query per input query using the MATCH operator against the embedding column with k = limit. Returns [(id, score)] per query.

count(self)

Returns the count of rows in the vectors virtual table.

save(self, path)

Handles three cases: (1) temporary database is copied to the target path using the SQLite backup API, (2) same path triggers a commit, (3) different path copies data to the new location while keeping the current connection.

close(self)

Closes the database connection and sets it to None.

initialize(self, recreate=False)

Creates the vec0 virtual table with the appropriate embedding type (FLOAT, INT8, or BIT) based on quantization settings. Optionally clears existing data when recreating.

connect(self, path)

Creates a new SQLite connection, loads the sqlite-vec extension, and returns the connection. Extension loading is bracketed by enable_load_extension calls for security.

Usage Examples

Basic Usage

import numpy as np
from txtai.ann.dense import SQLite

# Configuration for SQLite backend
config = {
    "backend": "sqlite",
    "dimensions": 384,
    "sqlite": {
        "table": "vectors",
        "quantize": 8  # INT8 quantization
    }
}

# Create and build the index
ann = SQLite(config)
embeddings = np.random.rand(500, 384).astype(np.float32)
ann.index(embeddings)

# Search for similar vectors
queries = np.random.rand(1, 384).astype(np.float32)
results = ann.search(queries, limit=10)
# results: [[(id, score), ...]]

# Save to file
ann.save("/tmp/vectors.db")

# Reload from file
ann2 = SQLite(config)
ann2.load("/tmp/vectors.db")

# Append more data
new_data = np.random.rand(100, 384).astype(np.float32)
ann2.append(new_data)

print(ann2.count())  # 600

ann2.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment