Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai Scan

From Leeroopedia


Knowledge Sources
Domains Embeddings, Search, Query Execution, Hybrid Search
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for scanning indexes to execute query matches, including hybrid search with configurable weights, provided by txtai.

Description

The Scan class executes parsed query clauses against embeddings indexes. It acts as the query execution engine that routes similar() function calls from parsed SQL-style queries to the appropriate index search functions.

Key features:

  • Multi-index routing: Query clauses are grouped by target index name and executed in batches. Each clause can optionally specify a target index via the similar() function parameters.
  • Candidate management: The number of candidates (results to retrieve from index queries) is configurable per clause. The default is derived intelligently: single filter clauses use the query limit, while multi-token WHERE clauses use 10x the limit to ensure enough candidates survive additional filtering.
  • Hybrid score weights: Each clause can specify a weight parameter for hybrid dense/sparse score combination. The maximum weight across clauses is used for the batch.
  • Bind parameter resolution: Supports named bind parameters (prefixed with ":") in similar() clause arguments, resolved against a parameters dictionary.
  • Clause parsing: The companion Clause class parses similar() function parameters into structured objects with text, index, candidates, and weights attributes. Parameters are distinguished by type: integers are candidates, floats are weights, and strings are index names.

The execution flow is:

  1. Parse query clauses from parsed queries, grouping by target index.
  2. Determine candidate counts and weights for each index group.
  3. Execute batch searches via the provided search function.
  4. Collect results keyed by query clause UID and sort by query order.

Usage

Use Scan as the internal query execution engine for embeddings search. It is instantiated and used by the embeddings search pipeline to execute the index-scanning portion of parsed SQL-style queries. Understanding Scan is useful for debugging search behavior, customizing hybrid search weights, or working with multi-index configurations.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/embeddings/search/scan.py

Signature

class Scan:
    def __init__(self, search, limit, weights, index)
    def __call__(self, queries, parameters) -> list
    def parse(self, queries, parameters) -> dict
    def bind(self, similar, parameters) -> list
    def default(self, queries) -> int

class Clause:
    def __init__(self, uid, qid, params)
    def parse(self, params)

Import

from txtai.embeddings.search.scan import Scan, Clause

I/O Contract

Inputs

Name Type Required Description
search callable Yes Index search function accepting (queries, candidates, weights, index) and returning batch results.
limit int Yes Default maximum results per query.
weights float No Default hybrid score weights for dense/sparse combination.
index str No Default index name when no index is specified in the query clause.
queries list[dict] Yes (__call__) List of parsed query dictionaries. Each may contain similar (list of parameter lists) and where (SQL WHERE clause string) keys.
parameters list[dict] No List of bind parameter dictionaries, one per query. Used to resolve ":" prefixed placeholders in similar() arguments.

Outputs

Name Type Description
results list[tuple(int, list)] List of (query_id, results) tuples sorted by query clause UID. Each result is a list of (id, score) tuples from the index search.
parsed clauses dict Dictionary mapping index names to lists of Clause objects.
default candidates int Default candidate count: limit for simple queries, limit * 10 for multi-token WHERE clauses.

Usage Examples

from txtai.embeddings import Embeddings

# Scan is used internally by embeddings search
# The following demonstrates the query patterns that invoke Scan

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})

embeddings.index([
    {"id": 0, "text": "machine learning models", "category": "AI"},
    {"id": 1, "text": "web application development", "category": "dev"},
    {"id": 2, "text": "natural language processing", "category": "AI"},
])

# Simple similar() query - Scan executes with default candidates
results = embeddings.search(
    "SELECT id, text, score FROM txtai WHERE similar('machine learning')"
)

# similar() with candidate count - Scan uses 100 candidates
results = embeddings.search(
    "SELECT id, text, score FROM txtai WHERE similar('machine learning', 100)"
)

# similar() with hybrid weight - Scan passes 0.5 weight to search
results = embeddings.search(
    "SELECT id, text, score FROM txtai WHERE similar('machine learning', 0.5)"
)

# similar() with target index - Scan routes to named subindex
results = embeddings.search(
    "SELECT id, text, score FROM txtai WHERE similar('machine learning', 'sparse')"
)

# Bind parameters - Scan resolves :query placeholder
results = embeddings.search(
    "SELECT id, text, score FROM txtai WHERE similar(:query)",
    parameters={"query": "machine learning"}
)

# Multi-clause WHERE - Scan uses 10x candidates for better filtering
results = embeddings.search(
    "SELECT id, text, score FROM txtai WHERE similar('machine learning') AND category = 'AI'"
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment