Implementation:Neuml Txtai Embeddings Search
| Knowledge Sources | |
|---|---|
| Domains | Semantic_Search, NLP |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for retrieving documents by semantic meaning rather than keywords provided by the txtai library.
Description
The Embeddings.search method finds documents most similar to an input query. It delegates to batchsearch() with a single-element query list, which internally creates a Search object that coordinates the full retrieval pipeline: query encoding, ANN lookup, optional sparse scoring, hybrid score combination, SQL filtering (when a database is present), and graph subgraph extraction (when graph mode is enabled). The method supports dense-only, sparse-only, and hybrid search depending on the embeddings configuration. It returns results in different formats depending on whether content storage and graph mode are enabled.
Usage
Use this method to find documents semantically similar to a natural language query. Call it with a query string and an optional result limit. For hybrid search, pass a weights parameter to control the balance between dense and sparse scoring. Use the index parameter to search a specific subindex. Use the parameters parameter to bind values to SQL placeholders when using database-backed filtering queries.
Code Reference
Source Location
- Repository: txtai
- File:
src/python/txtai/embeddings/base.py - Lines: L356-376
Signature
def search(self, query, limit=None, weights=None, index=None, parameters=None, graph=False):
Import
from txtai.embeddings import Embeddings
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| query | str | Yes | The search query text. This can be a natural language question, a phrase, or (when a database is present) a SQL-like query string such as "select id, text, score from txtai where similar('search terms') and category = 'AI'".
|
| limit | int or None | No | Maximum number of results to return. Defaults to None, which resolves to 3 internally. |
| weights | float or None | No | Hybrid score weighting factor between 0.0 and 1.0. Controls the balance between dense vector similarity (weight toward 1.0) and sparse keyword scoring (weight toward 0.0). A value of 0.5 gives equal weight to both signals. Only applicable when both dense and sparse indexes are configured. Defaults to None (internally resolves to 0.5 for hybrid). |
| index | str or None | No | Name of a specific subindex to search. When provided, the search is executed against the named subindex instead of the primary index. Defaults to None (search the primary index). |
| parameters | dict or None | No | Dictionary of named parameters to bind to SQL placeholders in the query. Used with database-backed SQL filtering queries. Defaults to None. |
| graph | bool | No | When True and a graph index is configured, returns graph results (subgraph of matching nodes and their relationships) instead of flat result lists. Defaults to False. |
Outputs
| Name | Type | Description |
|---|---|---|
| results | list of tuple or list of dict or graph | The return type depends on the embeddings configuration: Index-only (no content storage): returns a list of (id, score) tuples sorted by descending score. Index + database (content enabled): returns a list of dictionaries, each containing keys like "id", "text", and "score". Graph mode (graph=True): returns a graph object representing the subgraph of matching results and their relationships. Returns an empty list if no results are found.
|
Usage Examples
Basic Example: Dense Search
from txtai.embeddings import Embeddings
# Create and populate an index without content storage
embeddings = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2"})
embeddings.index([
(0, "US tops 5 million confirmed virus cases", None),
(1, "Canada's last fully intact ice shelf has suddenly collapsed", None),
(2, "Beijing launches high-tech citywide expenses tracking", None),
(3, "The National Park Service warns against sacrificing slower friends", None),
(4, "Maine moose are getting ticks at an alarming rate", None)
])
# Search returns (id, score) tuples
results = embeddings.search("pandemic health crisis", limit=3)
for uid, score in results:
print(f"ID: {uid}, Score: {score:.4f}")
# Output example:
# ID: 0, Score: 0.4532
# ID: 4, Score: 0.1287
# ID: 1, Score: 0.0843
Example: Search with Content Storage
from txtai.embeddings import Embeddings
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"content": True
})
embeddings.index([
(0, "US tops 5 million confirmed virus cases", None),
(1, "Canada's last fully intact ice shelf has suddenly collapsed", None),
(2, "Beijing launches high-tech citywide expenses tracking", None)
])
# Search returns dicts with content when content storage is enabled
results = embeddings.search("climate change", limit=2)
for result in results:
print(f"ID: {result['id']}, Score: {result['score']:.4f}, Text: {result['text']}")
Example: SQL Filtering
from txtai.embeddings import Embeddings
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"content": True
})
embeddings.index([
{"id": 0, "text": "Machine learning algorithms", "category": "AI"},
{"id": 1, "text": "Database indexing strategies", "category": "DB"},
{"id": 2, "text": "Neural network training", "category": "AI"},
{"id": 3, "text": "SQL query optimization", "category": "DB"}
])
# SQL-like query with filtering
results = embeddings.search(
"select id, text, score from txtai where similar('deep learning') and category = :cat limit 2",
parameters={"cat": "AI"}
)
for result in results:
print(result)
Example: Hybrid Search
from txtai.embeddings import Embeddings
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"content": True,
"hybrid": True
})
embeddings.index([
(0, "Python programming language tutorial", None),
(1, "Introduction to the Python snake species", None),
(2, "Java programming best practices", None)
])
# Hybrid search combining dense semantic similarity and sparse keyword scoring
# weights=0.7 gives 70% weight to dense similarity, 30% to keyword match
results = embeddings.search("python programming", limit=3, weights=0.7)
for result in results:
print(f"ID: {result['id']}, Score: {result['score']:.4f}")