Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Neuml Txtai Embeddings Search

From Leeroopedia


Knowledge Sources
Domains Semantic_Search, NLP
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for retrieving documents by semantic meaning rather than keywords provided by the txtai library.

Description

The Embeddings.search method finds documents most similar to an input query. It delegates to batchsearch() with a single-element query list, which internally creates a Search object that coordinates the full retrieval pipeline: query encoding, ANN lookup, optional sparse scoring, hybrid score combination, SQL filtering (when a database is present), and graph subgraph extraction (when graph mode is enabled). The method supports dense-only, sparse-only, and hybrid search depending on the embeddings configuration. It returns results in different formats depending on whether content storage and graph mode are enabled.

Usage

Use this method to find documents semantically similar to a natural language query. Call it with a query string and an optional result limit. For hybrid search, pass a weights parameter to control the balance between dense and sparse scoring. Use the index parameter to search a specific subindex. Use the parameters parameter to bind values to SQL placeholders when using database-backed filtering queries.

Code Reference

Source Location

  • Repository: txtai
  • File: src/python/txtai/embeddings/base.py
  • Lines: L356-376

Signature

def search(self, query, limit=None, weights=None, index=None, parameters=None, graph=False):

Import

from txtai.embeddings import Embeddings

I/O Contract

Inputs

Name Type Required Description
query str Yes The search query text. This can be a natural language question, a phrase, or (when a database is present) a SQL-like query string such as "select id, text, score from txtai where similar('search terms') and category = 'AI'".
limit int or None No Maximum number of results to return. Defaults to None, which resolves to 3 internally.
weights float or None No Hybrid score weighting factor between 0.0 and 1.0. Controls the balance between dense vector similarity (weight toward 1.0) and sparse keyword scoring (weight toward 0.0). A value of 0.5 gives equal weight to both signals. Only applicable when both dense and sparse indexes are configured. Defaults to None (internally resolves to 0.5 for hybrid).
index str or None No Name of a specific subindex to search. When provided, the search is executed against the named subindex instead of the primary index. Defaults to None (search the primary index).
parameters dict or None No Dictionary of named parameters to bind to SQL placeholders in the query. Used with database-backed SQL filtering queries. Defaults to None.
graph bool No When True and a graph index is configured, returns graph results (subgraph of matching nodes and their relationships) instead of flat result lists. Defaults to False.

Outputs

Name Type Description
results list of tuple or list of dict or graph The return type depends on the embeddings configuration: Index-only (no content storage): returns a list of (id, score) tuples sorted by descending score. Index + database (content enabled): returns a list of dictionaries, each containing keys like "id", "text", and "score". Graph mode (graph=True): returns a graph object representing the subgraph of matching results and their relationships. Returns an empty list if no results are found.

Usage Examples

Basic Example: Dense Search

from txtai.embeddings import Embeddings

# Create and populate an index without content storage
embeddings = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2"})

embeddings.index([
    (0, "US tops 5 million confirmed virus cases", None),
    (1, "Canada's last fully intact ice shelf has suddenly collapsed", None),
    (2, "Beijing launches high-tech citywide expenses tracking", None),
    (3, "The National Park Service warns against sacrificing slower friends", None),
    (4, "Maine moose are getting ticks at an alarming rate", None)
])

# Search returns (id, score) tuples
results = embeddings.search("pandemic health crisis", limit=3)
for uid, score in results:
    print(f"ID: {uid}, Score: {score:.4f}")
# Output example:
# ID: 0, Score: 0.4532
# ID: 4, Score: 0.1287
# ID: 1, Score: 0.0843

Example: Search with Content Storage

from txtai.embeddings import Embeddings

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})

embeddings.index([
    (0, "US tops 5 million confirmed virus cases", None),
    (1, "Canada's last fully intact ice shelf has suddenly collapsed", None),
    (2, "Beijing launches high-tech citywide expenses tracking", None)
])

# Search returns dicts with content when content storage is enabled
results = embeddings.search("climate change", limit=2)
for result in results:
    print(f"ID: {result['id']}, Score: {result['score']:.4f}, Text: {result['text']}")

Example: SQL Filtering

from txtai.embeddings import Embeddings

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})

embeddings.index([
    {"id": 0, "text": "Machine learning algorithms", "category": "AI"},
    {"id": 1, "text": "Database indexing strategies", "category": "DB"},
    {"id": 2, "text": "Neural network training", "category": "AI"},
    {"id": 3, "text": "SQL query optimization", "category": "DB"}
])

# SQL-like query with filtering
results = embeddings.search(
    "select id, text, score from txtai where similar('deep learning') and category = :cat limit 2",
    parameters={"cat": "AI"}
)
for result in results:
    print(result)

Example: Hybrid Search

from txtai.embeddings import Embeddings

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True,
    "hybrid": True
})

embeddings.index([
    (0, "Python programming language tutorial", None),
    (1, "Introduction to the Python snake species", None),
    (2, "Java programming best practices", None)
])

# Hybrid search combining dense semantic similarity and sparse keyword scoring
# weights=0.7 gives 70% weight to dense similarity, 30% to keyword match
results = embeddings.search("python programming", limit=3, weights=0.7)
for result in results:
    print(f"ID: {result['id']}, Score: {result['score']:.4f}")

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment