Implementation:Neuml Txtai Embeddings Search

Knowledge Sources	txtai txtai Documentation
Domains	Semantic_Search, NLP
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for retrieving documents by semantic meaning rather than keywords provided by the txtai library.

Description

The Embeddings.search method finds documents most similar to an input query. It delegates to batchsearch() with a single-element query list, which internally creates a Search object that coordinates the full retrieval pipeline: query encoding, ANN lookup, optional sparse scoring, hybrid score combination, SQL filtering (when a database is present), and graph subgraph extraction (when graph mode is enabled). The method supports dense-only, sparse-only, and hybrid search depending on the embeddings configuration. It returns results in different formats depending on whether content storage and graph mode are enabled.

Usage

Use this method to find documents semantically similar to a natural language query. Call it with a query string and an optional result limit. For hybrid search, pass a weights parameter to control the balance between dense and sparse scoring. Use the index parameter to search a specific subindex. Use the parameters parameter to bind values to SQL placeholders when using database-backed filtering queries.

Code Reference

Source Location

Repository: txtai
File: src/python/txtai/embeddings/base.py
Lines: L356-376

Signature

def search(self, query, limit=None, weights=None, index=None, parameters=None, graph=False):

Import

from txtai.embeddings import Embeddings

I/O Contract

Inputs

Name	Type	Required	Description
query	str	Yes	The search query text. This can be a natural language question, a phrase, or (when a database is present) a SQL-like query string such as `"select id, text, score from txtai where similar('search terms') and category = 'AI'"`.
limit	int or None	No	Maximum number of results to return. Defaults to None, which resolves to 3 internally.
weights	float or None	No	Hybrid score weighting factor between 0.0 and 1.0. Controls the balance between dense vector similarity (weight toward 1.0) and sparse keyword scoring (weight toward 0.0). A value of 0.5 gives equal weight to both signals. Only applicable when both dense and sparse indexes are configured. Defaults to None (internally resolves to 0.5 for hybrid).
index	str or None	No	Name of a specific subindex to search. When provided, the search is executed against the named subindex instead of the primary index. Defaults to None (search the primary index).
parameters	dict or None	No	Dictionary of named parameters to bind to SQL placeholders in the query. Used with database-backed SQL filtering queries. Defaults to None.
graph	bool	No	When True and a graph index is configured, returns graph results (subgraph of matching nodes and their relationships) instead of flat result lists. Defaults to False.

Outputs

Name	Type	Description
results	list of tuple or list of dict or graph	The return type depends on the embeddings configuration: Index-only (no content storage): returns a list of `(id, score)` tuples sorted by descending score. Index + database (content enabled): returns a list of dictionaries, each containing keys like `"id"`, `"text"`, and `"score"`. Graph mode (graph=True): returns a graph object representing the subgraph of matching results and their relationships. Returns an empty list if no results are found.

Usage Examples

Basic Example: Dense Search

from txtai.embeddings import Embeddings

# Create and populate an index without content storage
embeddings = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2"})

embeddings.index([
    (0, "US tops 5 million confirmed virus cases", None),
    (1, "Canada's last fully intact ice shelf has suddenly collapsed", None),
    (2, "Beijing launches high-tech citywide expenses tracking", None),
    (3, "The National Park Service warns against sacrificing slower friends", None),
    (4, "Maine moose are getting ticks at an alarming rate", None)
])

# Search returns (id, score) tuples
results = embeddings.search("pandemic health crisis", limit=3)
for uid, score in results:
    print(f"ID: {uid}, Score: {score:.4f}")
# Output example:
# ID: 0, Score: 0.4532
# ID: 4, Score: 0.1287
# ID: 1, Score: 0.0843

Example: Search with Content Storage

from txtai.embeddings import Embeddings

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})

embeddings.index([
    (0, "US tops 5 million confirmed virus cases", None),
    (1, "Canada's last fully intact ice shelf has suddenly collapsed", None),
    (2, "Beijing launches high-tech citywide expenses tracking", None)
])

# Search returns dicts with content when content storage is enabled
results = embeddings.search("climate change", limit=2)
for result in results:
    print(f"ID: {result['id']}, Score: {result['score']:.4f}, Text: {result['text']}")

Example: SQL Filtering

from txtai.embeddings import Embeddings

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})

embeddings.index([
    {"id": 0, "text": "Machine learning algorithms", "category": "AI"},
    {"id": 1, "text": "Database indexing strategies", "category": "DB"},
    {"id": 2, "text": "Neural network training", "category": "AI"},
    {"id": 3, "text": "SQL query optimization", "category": "DB"}
])

# SQL-like query with filtering
results = embeddings.search(
    "select id, text, score from txtai where similar('deep learning') and category = :cat limit 2",
    parameters={"cat": "AI"}
)
for result in results:
    print(result)

Example: Hybrid Search

from txtai.embeddings import Embeddings

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True,
    "hybrid": True
})

embeddings.index([
    (0, "Python programming language tutorial", None),
    (1, "Introduction to the Python snake species", None),
    (2, "Java programming best practices", None)
])

# Hybrid search combining dense semantic similarity and sparse keyword scoring
# weights=0.7 gives 70% weight to dense similarity, 30% to keyword match
results = embeddings.search("python programming", limit=3, weights=0.7)
for result in results:
    print(f"ID: {result['id']}, Score: {result['score']:.4f}")

Related Pages

Implements Principle

Principle:Neuml_Txtai_Semantic_Search

Requires Environment

Environment:Neuml_Txtai_Python_Core_Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment