Implementation:Neuml Txtai PGText Scoring

Knowledge Sources	txtai txtai Documentation
Domains	Full_Text_Search, PostgreSQL
Last Updated	2026-02-09 00:00 GMT

Overview

PostgreSQL full-text search scoring backend using tsvector/tsquery with SQLAlchemy.

Description

The PGText class implements a scoring backend that leverages PostgreSQL's built-in full-text search (FTS) capabilities. It stores documents in a PostgreSQL table with an automatically computed tsvector column and a GIN index, enabling fast keyword-based retrieval using ts_rank scoring.

The class uses SQLAlchemy for all database interactions. The table schema consists of three columns:

indexid (Integer, primary key) -- the document's numeric identifier
text (Text) -- the raw document text
vector (TSVECTOR, computed) -- automatically generated from text using to_tsvector(language, text)

A GIN index is created on the vector column for fast full-text search. The search method uses plainto_tsquery to parse queries and ts_rank to score results. Wildcard characters are handled by converting bare * to PostgreSQL's prefix matching syntax :*.

The class manages its own database connection lifecycle with StaticPool for connection pooling. Sessions and connections are committed on save() and rolled back on load(), providing basic transactional control. Schema support allows placing tables in a specific PostgreSQL schema.

The PGText class requires the "scoring" extra to be installed, which provides sqlalchemy and the PostgreSQL dialect.

Usage

Use this scoring backend when you need keyword-based full-text search powered by PostgreSQL. It is typically created through the ScoringFactory as part of an embeddings configuration rather than instantiated directly. It integrates with the txtai scoring pipeline to provide sparse keyword scoring alongside or instead of dense vector search.

Code Reference

Source Location

Repository: txtai
File: src/python/txtai/scoring/pgtext.py
Lines: L1-185

Class Definition

class PGText(Scoring):
    """
    Postgres full text search (FTS) based scoring.
    """

Constructor Signature

def __init__(self, config=None):

The constructor calls super().__init__(config), checks for the availability of SQLAlchemy (raises ImportError if missing), and initializes connection attributes to None. The language setting is read from config, defaulting to "english".

Import

from txtai.scoring import PGText  # via ScoringFactory

I/O Contract

insert(documents, index=None, checkpoint=None)

Name	Type	Required	Description
documents	iterable of (uid, document, tags)	Yes	Documents to insert. Each document can be a string, a list of strings (joined with spaces), or a dict with `"text"` or `"object"` keys.
index	int or None	No	Starting index offset. When provided, documents are assigned sequential IDs starting from this value. When `None`, the original `uid` is used.
checkpoint	any	No	Not used by PGText; present for interface compatibility.

search(query, limit=3)

Name	Type	Required	Description
query	str	Yes	Search query string. Wildcards (``) are converted to PostgreSQL prefix wildcards (`:`). Processed with `plainto_tsquery`.
limit	int	No	Maximum number of results. Defaults to `3`.

Returns: A list of (indexid, score) tuples where score is the ts_rank value, filtered to exclude scores below 1e-5.

Key Methods

insert(documents, index=None, checkpoint=None)

Initializes tables (with recreate=True, dropping existing tables), collects document rows, and performs a bulk insert. Documents can be strings, lists of strings (joined), or dicts with text/object keys.

delete(ids)

Deletes rows by indexid using a SQL WHERE indexid IN (...) clause.

weights(tokens)

Returns None -- token weighting is not supported by the PostgreSQL FTS backend.

search(query, limit=3)

Executes a PostgreSQL full-text search query using plainto_tsquery and ts_rank:

# Wildcard handling: bare * becomes :* for prefix matching
query = re.sub(r"(?<!\:)\*", ":*", query)

# Query with ts_rank scoring, ordered by rank descending
query = (
    self.database.query(self.table.c["indexid"],
        text("ts_rank(vector, plainto_tsquery(:language, :query)) rank"))
    .order_by(desc(text("rank")))
    .limit(limit)
    .params({"language": self.language, "query": query})
)

return [(uid, score) for uid, score in query if score > 1e-5]

batchsearch(queries, limit=3, threads=True)

Sequentially calls search() for each query. The threads parameter is accepted for interface compatibility but not used.

count()

Returns the total number of rows in the scoring table using func.count().

load(path)

Rolls back the current session and connection to reset to the last checkpoint, then reinitializes tables.

save(path)

Commits the current session and connection to persist changes.

close()

Closes the database session and disposes the SQLAlchemy engine.

initialize(recreate=False)

Creates the database engine, connection, session, and table schema. When recreate=True, drops and recreates the table and GIN index. Supports configurable schema, table name, and connection URL (from config or SCORING_URL environment variable).

# Table schema with computed tsvector column
self.table = Table(
    table, MetaData(),
    Column("indexid", Integer, primary_key=True, autoincrement=False),
    Column("text", Text),
    Column("vector", TSVECTOR,
           Computed(f"to_tsvector('{self.language}', text)", persisted=True))
)

# GIN index for fast full-text search
index = Index(f"{table}-index", self.table.c["vector"], postgresql_using="gin")

sqldialect(sql, parameters=None)

Executes SQL only when the dialect is PostgreSQL; falls back to SELECT 1 for non-PostgreSQL engines. Used for schema creation and search path configuration.

Configuration

Key	Type	Default	Description
url	str	`SCORING_URL` env var	SQLAlchemy database connection URL.
language	str	`"english"`	PostgreSQL text search configuration language.
schema	str or None	None	PostgreSQL schema name. Created with `IF NOT EXISTS`.
table	str	`"scoring"`	Table name for the scoring data.
columns.text	str	`"text"`	Key to extract text content from dict documents.
columns.object	str	`"object"`	Fallback key when `"text"` is not found in dict documents.

Inheritance Chain

PGText -> Scoring

The Scoring base class provides the interface contract (insert, delete, search, batchsearch, count, load, save, close) and column configuration parsing.

Usage Examples

Configuring PGText Scoring

from txtai.embeddings import Embeddings

# Configure embeddings with PostgreSQL full-text scoring
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True,
    "scoring": {
        "method": "pgtext",
        "url": "postgresql://user:pass@localhost/mydb",
        "language": "english"
    }
})

Direct PGText Usage

from txtai.scoring import PGText

config = {
    "url": "postgresql://user:pass@localhost/mydb",
    "language": "english",
    "table": "documents"
}

scoring = PGText(config)

# Insert documents
documents = [
    (0, "Machine learning algorithms for text classification", None),
    (1, "PostgreSQL full text search capabilities", None),
    (2, "Natural language processing pipelines", None)
]
scoring.insert(documents)
scoring.save(None)

# Search
results = scoring.search("text search", limit=2)
for uid, score in results:
    print(f"ID: {uid}, Score: {score:.4f}")

scoring.close()

Related Pages

Implements Principle

Principle:Neuml_Txtai_Keyword_Scoring

Requires Environment

Environment:Neuml_Txtai_Python_Scoring_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment