Implementation:Neuml Txtai Benchmarks

Knowledge Sources	Neuml_Txtai
Domains	Benchmarking, Information Retrieval, Evaluation
Last Updated	2026-02-10 01:00 GMT

Overview

Concrete tool for running BEIR benchmark evaluations across multiple retrieval methods provided by txtai.

Description

The Benchmarks module is a comprehensive evaluation runner that tests various retrieval and search methods against the BEIR (Benchmarking IR) dataset collection. It defines a base Index class and multiple specialized index implementations:

Embed: Dense vector embeddings using txtai Embeddings with FAISS backend
Hybrid: Combined embeddings + BM25 scoring using txtai
RetrievalAugmentedGeneration: RAG pipeline combining embeddings retrieval with LLM re-ranking
Score: BM25 scoring using txtai's ScoringFactory
Similar: Similarity pipeline using cross-encoder or bi-encoder models
Rerank: Two-stage retrieval with embeddings followed by similarity re-ranking
RankBM25: BM25 using the rank-bm25 library
BM25S: BM25 using the bm25s library with Lucene-style scoring
SQLiteFTS: BM25 via SQLite's FTS5 full-text search extension
Elastic: BM25 using Elasticsearch

Each index loads a BEIR corpus (corpus.jsonl), builds an index, runs queries (queries.jsonl), and evaluates against relevance judgments using pytrec_eval. Metrics include NDCG@k, MAP@k, Recall@k, and Precision@k. Results are output as JSON lines with timing, memory, and disk usage statistics.

Usage

Use the Benchmarks script to evaluate and compare different retrieval methods on standardized BEIR datasets. It is invoked from the command line with options for selecting specific methods, data sources, configuration files, and output directories. It supports incremental runs and caching of built indexes for faster re-evaluation.

Code Reference

Source Location

Repository: Neuml_Txtai
File: examples/benchmarks.py

Signature

class Index:
    def __init__(self, path, config, output, refresh)
    def __call__(self, limit, filterscores=True)
    def search(self, queries, limit)
    def index(self)
    def rows(self)
    def load(self)
    def batch(self, data, size)
    def readconfig(self, key, default)

class Embed(Index): ...
class Hybrid(Index): ...
class RetrievalAugmentedGeneration(Embed): ...
class Score(Index): ...
class Similar(Index): ...
class Rerank(Embed): ...
class RankBM25(Index): ...
class BM25S(Index): ...
class SQLiteFTS(Index): ...
class Elastic(Index): ...

def relevance(path)
def create(method, path, config, output, refresh)
def compute(results)
def evaluate(methods, path, args)
def benchmarks(args)

Import

# Typically run as a standalone script
python examples/benchmarks.py [options]

I/O Contract

Inputs

Name	Type	Required	Description
-d / --directory	str	No	Root directory path containing BEIR datasets; defaults to "beir"
-m / --methods	str	No	Comma-separated list of methods to evaluate (embed, hybrid, rag, scoring, rank, bm25s, sqlite, es, similar, rerank)
-s / --sources	str	No	Comma-separated list of BEIR dataset names to evaluate against
-c / --config	str	No	Path to YAML configuration file for custom index settings
-o / --output	str	No	Index output directory path
-r / --refresh	flag	No	If set, rebuilds indexes even if they already exist
-t / --topk	int	No	Top-k results for evaluation metrics; defaults to 10
-n / --name	str	No	Name to assign to the benchmark run; defaults to method name

Outputs

Name	Type	Description
benchmarks.json	JSON Lines file	One JSON object per method-source combination containing: source, method, name, index time, memory usage, disk usage, search time, NDCG@k, MAP@k, Recall@k, P@k

Usage Examples

# Run all benchmarks on all default BEIR datasets
# python examples/benchmarks.py

# Run specific methods on specific datasets
# python examples/benchmarks.py -m "embed,hybrid" -s "nfcorpus,scifact"

# Use custom configuration and output directory
# python examples/benchmarks.py -c config.yml -o /tmp/indexes -t 20

# Refresh (rebuild) indexes
# python examples/benchmarks.py -r -m "embed" -s "nfcorpus"

# Programmatic usage
from examples.benchmarks import create, evaluate, relevance

# Create a single index
index = create("embed", "beir/nfcorpus", "config.yml", "output/embed", refresh=True)

# Run search
results = index(limit=10)

Related Pages

Environment:Neuml_Txtai_Python_Core_Dependencies

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment