Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai PGText Scoring

From Leeroopedia


Knowledge Sources
Domains Full_Text_Search, PostgreSQL
Last Updated 2026-02-09 00:00 GMT

Overview

PostgreSQL full-text search scoring backend using tsvector/tsquery with SQLAlchemy.

Description

The PGText class implements a scoring backend that leverages PostgreSQL's built-in full-text search (FTS) capabilities. It stores documents in a PostgreSQL table with an automatically computed tsvector column and a GIN index, enabling fast keyword-based retrieval using ts_rank scoring.

The class uses SQLAlchemy for all database interactions. The table schema consists of three columns:

  • indexid (Integer, primary key) -- the document's numeric identifier
  • text (Text) -- the raw document text
  • vector (TSVECTOR, computed) -- automatically generated from text using to_tsvector(language, text)

A GIN index is created on the vector column for fast full-text search. The search method uses plainto_tsquery to parse queries and ts_rank to score results. Wildcard characters are handled by converting bare * to PostgreSQL's prefix matching syntax :*.

The class manages its own database connection lifecycle with StaticPool for connection pooling. Sessions and connections are committed on save() and rolled back on load(), providing basic transactional control. Schema support allows placing tables in a specific PostgreSQL schema.

The PGText class requires the "scoring" extra to be installed, which provides sqlalchemy and the PostgreSQL dialect.

Usage

Use this scoring backend when you need keyword-based full-text search powered by PostgreSQL. It is typically created through the ScoringFactory as part of an embeddings configuration rather than instantiated directly. It integrates with the txtai scoring pipeline to provide sparse keyword scoring alongside or instead of dense vector search.

Code Reference

Source Location

  • Repository: txtai
  • File: src/python/txtai/scoring/pgtext.py
  • Lines: L1-185

Class Definition

class PGText(Scoring):
    """
    Postgres full text search (FTS) based scoring.
    """

Constructor Signature

def __init__(self, config=None):

The constructor calls super().__init__(config), checks for the availability of SQLAlchemy (raises ImportError if missing), and initializes connection attributes to None. The language setting is read from config, defaulting to "english".

Import

from txtai.scoring import PGText  # via ScoringFactory

I/O Contract

insert(documents, index=None, checkpoint=None)

Name Type Required Description
documents iterable of (uid, document, tags) Yes Documents to insert. Each document can be a string, a list of strings (joined with spaces), or a dict with "text" or "object" keys.
index int or None No Starting index offset. When provided, documents are assigned sequential IDs starting from this value. When None, the original uid is used.
checkpoint any No Not used by PGText; present for interface compatibility.

search(query, limit=3)

Name Type Required Description
query str Yes Search query string. Wildcards (*) are converted to PostgreSQL prefix wildcards (:*). Processed with plainto_tsquery.
limit int No Maximum number of results. Defaults to 3.

Returns: A list of (indexid, score) tuples where score is the ts_rank value, filtered to exclude scores below 1e-5.

Key Methods

insert(documents, index=None, checkpoint=None)

Initializes tables (with recreate=True, dropping existing tables), collects document rows, and performs a bulk insert. Documents can be strings, lists of strings (joined), or dicts with text/object keys.

delete(ids)

Deletes rows by indexid using a SQL WHERE indexid IN (...) clause.

weights(tokens)

Returns None -- token weighting is not supported by the PostgreSQL FTS backend.

search(query, limit=3)

Executes a PostgreSQL full-text search query using plainto_tsquery and ts_rank:

# Wildcard handling: bare * becomes :* for prefix matching
query = re.sub(r"(?<!\:)\*", ":*", query)

# Query with ts_rank scoring, ordered by rank descending
query = (
    self.database.query(self.table.c["indexid"],
        text("ts_rank(vector, plainto_tsquery(:language, :query)) rank"))
    .order_by(desc(text("rank")))
    .limit(limit)
    .params({"language": self.language, "query": query})
)

return [(uid, score) for uid, score in query if score > 1e-5]

batchsearch(queries, limit=3, threads=True)

Sequentially calls search() for each query. The threads parameter is accepted for interface compatibility but not used.

count()

Returns the total number of rows in the scoring table using func.count().

load(path)

Rolls back the current session and connection to reset to the last checkpoint, then reinitializes tables.

save(path)

Commits the current session and connection to persist changes.

close()

Closes the database session and disposes the SQLAlchemy engine.

initialize(recreate=False)

Creates the database engine, connection, session, and table schema. When recreate=True, drops and recreates the table and GIN index. Supports configurable schema, table name, and connection URL (from config or SCORING_URL environment variable).

# Table schema with computed tsvector column
self.table = Table(
    table, MetaData(),
    Column("indexid", Integer, primary_key=True, autoincrement=False),
    Column("text", Text),
    Column("vector", TSVECTOR,
           Computed(f"to_tsvector('{self.language}', text)", persisted=True))
)

# GIN index for fast full-text search
index = Index(f"{table}-index", self.table.c["vector"], postgresql_using="gin")

sqldialect(sql, parameters=None)

Executes SQL only when the dialect is PostgreSQL; falls back to SELECT 1 for non-PostgreSQL engines. Used for schema creation and search path configuration.

Configuration

Key Type Default Description
url str SCORING_URL env var SQLAlchemy database connection URL.
language str "english" PostgreSQL text search configuration language.
schema str or None None PostgreSQL schema name. Created with IF NOT EXISTS.
table str "scoring" Table name for the scoring data.
columns.text str "text" Key to extract text content from dict documents.
columns.object str "object" Fallback key when "text" is not found in dict documents.

Inheritance Chain

PGText -> Scoring

The Scoring base class provides the interface contract (insert, delete, search, batchsearch, count, load, save, close) and column configuration parsing.

Usage Examples

Configuring PGText Scoring

from txtai.embeddings import Embeddings

# Configure embeddings with PostgreSQL full-text scoring
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True,
    "scoring": {
        "method": "pgtext",
        "url": "postgresql://user:pass@localhost/mydb",
        "language": "english"
    }
})

Direct PGText Usage

from txtai.scoring import PGText

config = {
    "url": "postgresql://user:pass@localhost/mydb",
    "language": "english",
    "table": "documents"
}

scoring = PGText(config)

# Insert documents
documents = [
    (0, "Machine learning algorithms for text classification", None),
    (1, "PostgreSQL full text search capabilities", None),
    (2, "Natural language processing pipelines", None)
]
scoring.insert(documents)
scoring.save(None)

# Search
results = scoring.search("text search", limit=2)
for uid, score in results:
    print(f"ID: {uid}, Score: {score:.4f}")

scoring.close()

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment