Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai Benchmarks Example

From Leeroopedia


Knowledge Sources
Domains Benchmarking, Information_Retrieval
Last Updated 2026-02-09 17:00 GMT

Overview

Concrete tool for running comprehensive benchmark evaluations comparing different search and retrieval methods in txtai against external baselines.

Description

The benchmarks.py example implements a pluggable benchmarking framework that evaluates multiple retrieval strategies (dense embeddings, hybrid search, BM25, sparse scoring, reranking, RAG) using standard IR evaluation metrics via pytrec_eval. It defines a base Index class with subclasses for each method, loads BEIR-format datasets, runs queries, and computes NDCG/MAP/Recall scores. External baselines include Elasticsearch, rank_bm25, bm25s, and SQLite FTS.

Usage

Use this example when evaluating txtai retrieval quality against baselines on standard IR benchmark datasets (e.g., BEIR collections). It serves as a reference for how to set up comparative benchmarks and measure search effectiveness.

Code Reference

Source Location

Signature

class Index:
    def __init__(self, path, config, output, refresh):
        """
        Creates an Index benchmark runner.

        Args:
            path: path to BEIR dataset
            config: YAML configuration dict
            output: output directory for results
            refresh: if True, rebuild index from scratch
        """

    def __call__(self, limit, filterscores=True):
        """Runs search evaluation and returns results dict."""

    def search(self, queries, limit):
        """Executes search queries against the index."""

    def index(self):
        """Builds the embeddings index."""

class Embed(Index):
    """Dense embeddings search benchmark."""

class Hybrid(Index):
    """Hybrid dense + sparse search benchmark."""

class RetrievalAugmentedGeneration(Index):
    """RAG benchmark with LLM reranking."""

class Score(Index):
    """Keyword scoring (BM25/TF-IDF) benchmark."""

class Similar(Index):
    """Similarity pipeline benchmark."""

class Rerank(Index):
    """Two-stage retrieval with reranking benchmark."""

class RankBM25(Index):
    """rank_bm25 library baseline benchmark."""

class BM25S(Index):
    """bm25s library baseline benchmark."""

class SQLiteFTS(Index):
    """SQLite full-text search baseline benchmark."""

class Elastic(Index):
    """Elasticsearch baseline benchmark."""

Import

# Run directly as a script
python examples/benchmarks.py -p /path/to/beir/dataset -c config.yml -o output/

I/O Contract

Inputs

Name Type Required Description
path str Yes Path to BEIR-format dataset directory containing corpus.jsonl and queries.jsonl
config str Yes Path to YAML configuration file specifying methods and embeddings settings
output str No Output directory for benchmark results (CSV files)
refresh bool No If True, rebuild indexes from scratch instead of loading existing
limit int No Number of results to retrieve per query (default from config)

Outputs

Name Type Description
results dict Dictionary mapping method names to {query_id: {doc_id: score}} dicts
metrics dict NDCG@10, MAP, Recall scores computed via pytrec_eval
CSV files Files Per-method result files written to output directory

Usage Examples

Running Benchmarks

# Command-line usage
# python examples/benchmarks.py -p /data/beir/nfcorpus -c config.yml -o results/

# Example config.yml:
# path: /data/beir/nfcorpus
# embed:
#   path: sentence-transformers/nli-mpnet-base-v2
#   content: true
# methods:
#   - embed
#   - hybrid
#   - score
# limit: 10

# Programmatic usage
from examples.benchmarks import evaluate, create

# Create an index instance
index = create("embed", "/data/beir/nfcorpus", config, "results/", refresh=True)

# Run evaluation
results = index(limit=10)

Adding a Custom Method

# Subclass Index to add a custom retrieval method
class CustomSearch(Index):
    def index(self):
        """Build custom index."""
        self.embeddings = Embeddings(self.config)
        self.embeddings.index(self.rows())

    def search(self, queries, limit):
        """Run custom search logic."""
        results = {}
        for qid, query in queries:
            results[qid] = self.embeddings.search(query, limit)
        return results

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment