Implementation:Neuml Txtai PgVector ANN

Knowledge Sources	Neuml_Txtai
Domains	Vector_Search, ANN
Last Updated	2026-02-10 01:00 GMT

Overview

Concrete ANN backend for PostgreSQL-based vector similarity search using the pgvector extension, provided by txtai.

Description

PGVector is an ANN implementation that builds approximate nearest neighbor indexes backed by a PostgreSQL database with the pgvector extension. It stores embeddings in a database table and creates an HNSW index for efficient similarity search using inner product distance. The class supports full 32-bit float vectors (VECTOR), half-precision 16-bit vectors (HALFVEC), and binary bit vectors (BIT) with hamming distance scoring. It uses SQLAlchemy for database connectivity and session management.

Usage

Use the PGVector backend when you need persistent, database-backed vector search with PostgreSQL. Select this backend by setting the ANN backend configuration to "pgvector". Requires the pgvector and sqlalchemy Python packages, installed via the txtai "ann" extra. The database URL is configured via the url setting or the ANN_URL environment variable.

Code Reference

Source Location

Repository: Neuml_Txtai
File: src/python/txtai/ann/dense/pgvector.py
Lines: 1-324

Signature

class PGVector(ANN):
    """Builds an ANN index backed by a Postgres database."""

    def __init__(self, config)
    def load(self, path)
    def index(self, embeddings)
    def append(self, embeddings)
    def delete(self, ids)
    def search(self, queries, limit)
    def count(self)
    def save(self, path)
    def close(self)
    def initialize(self, recreate=False)
    def createindex(self)
    def connect(self)
    def schema(self)
    def settings(self)
    def sqldialect(self, sql, parameters=None)
    def defaulttable(self)
    def url(self)
    def column(self)
    def operation(self)
    def prepare(self, data)
    def query(self, query)
    def score(self, score)

Import

from txtai.ann import ANNFactory

I/O Contract

Inputs

Name	Type	Required	Description
config	dict	Yes	ANN configuration dictionary containing backend settings
config["backend"]	str	Yes	Must be set to "pgvector" to select this backend
url	str	No	PostgreSQL connection URL (falls back to ANN_URL environment variable)
table	str	No	Database table name (default: "vectors")
schema	str	No	Database schema name (optional)
m	int	No	HNSW M parameter controlling number of connections (default: 16)
efconstruction	int	No	HNSW ef_construction parameter (default: 200)
quantize	int	No	Scalar quantization bit width for BIT vectors
precision	str	No	Set to "half" for 16-bit HALFVEC storage

Outputs

Name	Type	Description
search() returns	list	List of lists of (id, score) tuples, one list per query
count() returns	int	Number of rows in the vectors table
save() side-effect	commit	Commits the current database session and connection

Usage Examples

from txtai import Embeddings

# Create embeddings with PGVector backend
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "backend": "pgvector",
    "pgvector": {
        "url": "postgresql://user:pass@localhost/dbname",
        "table": "embeddings",
        "m": 16,
        "efconstruction": 200
    }
})

# Index data
embeddings.index([
    "US tops 5 million confirmed virus cases",
    "Canada's last intact ice shelf has broken up",
    "Beijing urges strong action on climate change",
    "New York battles severe winter storm"
])

# Search
results = embeddings.search("climate change effects", 2)
print(results)

# Using PGVector with half-precision storage
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "backend": "pgvector",
    "pgvector": {
        "url": "postgresql://user:pass@localhost/dbname",
        "precision": "half"
    }
})

Related Pages

Environment:Neuml_Txtai_Python_Core_Dependencies

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment