Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai PgSparse ANN

From Leeroopedia


Knowledge Sources
Domains Vector_Search, ANN
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete ANN backend for PostgreSQL-based sparse vector similarity search using pgvector's SPARSEVEC type, provided by txtai.

Description

PGSparse is an ANN implementation that extends PGVector to support sparse vector storage in PostgreSQL. It uses the SPARSEVEC column type from the pgvector extension with inner product operations (sparsevec_ip_ops). Sparse input data is wrapped as SparseVector objects before insertion, and vectors with more than 1000 non-zero values are automatically trimmed to the top 1000 values to comply with pgvector limitations. Scalar quantization is explicitly disabled. The database URL can be configured via the url setting or the SCORING_URL / ANN_URL environment variables.

Usage

Use the PGSparse backend for sparse vector similarity search backed by PostgreSQL, such as storing BM25 or TF-IDF scoring vectors. Select this backend by setting the ANN backend configuration to "pgsparse". Requires the pgvector and sqlalchemy Python packages, installed via the txtai "ann" extra. Inherits all connection, schema, table, and index management from PGVector.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/ann/sparse/pgsparse.py
  • Lines: 1-57

Signature

class PGSparse(PGVector):
    """Builds a Sparse ANN index backed by a Postgres database."""

    def __init__(self, config)
    def defaulttable(self)
    def url(self)
    def column(self)
    def operation(self)
    def prepare(self, data)

Import

from txtai.ann import ANNFactory

I/O Contract

Inputs

Name Type Required Description
config dict Yes ANN configuration dictionary containing backend settings
config["backend"] str Yes Must be set to "pgsparse" to select this backend
config["dimensions"] int Yes Dimensionality of the sparse embedding vectors
url str No PostgreSQL connection URL (falls back to SCORING_URL, then ANN_URL env vars)
table str No Database table name (default: "svectors")
schema str No Database schema name (optional, inherited from PGVector)
m int No HNSW M parameter (default: 16, inherited from PGVector)
efconstruction int No HNSW ef_construction parameter (default: 200, inherited from PGVector)

Outputs

Name Type Description
search() returns list List of lists of (id, score) tuples using inner product similarity
count() returns int Number of rows in the sparse vectors table
save() side-effect commit Commits the current database session and connection

Usage Examples

from txtai import Embeddings

# Create embeddings with PGSparse backend
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "backend": "pgsparse",
    "pgsparse": {
        "url": "postgresql://user:pass@localhost/dbname",
        "table": "sparse_embeddings"
    }
})

# Index data
embeddings.index([
    "US tops 5 million confirmed virus cases",
    "Canada's last intact ice shelf has broken up",
    "Beijing urges strong action on climate change",
    "New York battles severe winter storm"
])

# Search
results = embeddings.search("climate change effects", 2)
print(results)
# PGSparse using environment variable for connection
import os
os.environ["SCORING_URL"] = "postgresql://user:pass@localhost/dbname"

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "backend": "pgsparse"
})

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment