Implementation:Neuml Txtai PgVector ANN
| Knowledge Sources | |
|---|---|
| Domains | Vector_Search, ANN |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete ANN backend for PostgreSQL-based vector similarity search using the pgvector extension, provided by txtai.
Description
PGVector is an ANN implementation that builds approximate nearest neighbor indexes backed by a PostgreSQL database with the pgvector extension. It stores embeddings in a database table and creates an HNSW index for efficient similarity search using inner product distance. The class supports full 32-bit float vectors (VECTOR), half-precision 16-bit vectors (HALFVEC), and binary bit vectors (BIT) with hamming distance scoring. It uses SQLAlchemy for database connectivity and session management.
Usage
Use the PGVector backend when you need persistent, database-backed vector search with PostgreSQL. Select this backend by setting the ANN backend configuration to "pgvector". Requires the pgvector and sqlalchemy Python packages, installed via the txtai "ann" extra. The database URL is configured via the url setting or the ANN_URL environment variable.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File: src/python/txtai/ann/dense/pgvector.py
- Lines: 1-324
Signature
class PGVector(ANN):
"""Builds an ANN index backed by a Postgres database."""
def __init__(self, config)
def load(self, path)
def index(self, embeddings)
def append(self, embeddings)
def delete(self, ids)
def search(self, queries, limit)
def count(self)
def save(self, path)
def close(self)
def initialize(self, recreate=False)
def createindex(self)
def connect(self)
def schema(self)
def settings(self)
def sqldialect(self, sql, parameters=None)
def defaulttable(self)
def url(self)
def column(self)
def operation(self)
def prepare(self, data)
def query(self, query)
def score(self, score)
Import
from txtai.ann import ANNFactory
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | dict | Yes | ANN configuration dictionary containing backend settings |
| config["backend"] | str | Yes | Must be set to "pgvector" to select this backend |
| url | str | No | PostgreSQL connection URL (falls back to ANN_URL environment variable) |
| table | str | No | Database table name (default: "vectors") |
| schema | str | No | Database schema name (optional) |
| m | int | No | HNSW M parameter controlling number of connections (default: 16) |
| efconstruction | int | No | HNSW ef_construction parameter (default: 200) |
| quantize | int | No | Scalar quantization bit width for BIT vectors |
| precision | str | No | Set to "half" for 16-bit HALFVEC storage |
Outputs
| Name | Type | Description |
|---|---|---|
| search() returns | list | List of lists of (id, score) tuples, one list per query |
| count() returns | int | Number of rows in the vectors table |
| save() side-effect | commit | Commits the current database session and connection |
Usage Examples
from txtai import Embeddings
# Create embeddings with PGVector backend
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"backend": "pgvector",
"pgvector": {
"url": "postgresql://user:pass@localhost/dbname",
"table": "embeddings",
"m": 16,
"efconstruction": 200
}
})
# Index data
embeddings.index([
"US tops 5 million confirmed virus cases",
"Canada's last intact ice shelf has broken up",
"Beijing urges strong action on climate change",
"New York battles severe winter storm"
])
# Search
results = embeddings.search("climate change effects", 2)
print(results)
# Using PGVector with half-precision storage
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"backend": "pgvector",
"pgvector": {
"url": "postgresql://user:pass@localhost/dbname",
"precision": "half"
}
})