Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai Annoy ANN

From Leeroopedia


Knowledge Sources
Domains Vector_Search, ANN
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete ANN backend for approximate nearest neighbor search using the Annoy (Approximate Nearest Neighbors Oh Yeah) library, provided by txtai.

Description

Annoy is an ANN implementation that builds approximate nearest neighbor indexes using Spotify's Annoy library. It creates a forest of random projection trees for fast similarity search using dot product distance (equivalent to cosine similarity on normalized vectors). The index is built by adding items one at a time and then constructing the tree forest. Annoy is a read-only index after building -- it does not support append or delete operations natively.

Usage

Use the Annoy backend when you need a simple, memory-mapped read-only ANN index with fast search. Select this backend by setting the ANN backend configuration to "annoy". Requires the annoy Python package, installed via the txtai "ann" extra. Key tuning parameters include ntrees (number of trees in the forest) and searchk (search parameter controlling accuracy/speed tradeoff).

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/ann/dense/annoy.py
  • Lines: 1-73

Signature

class Annoy(ANN):
    """Builds an ANN index using the Annoy library."""

    def __init__(self, config)
    def load(self, path)
    def index(self, embeddings)
    def search(self, queries, limit)
    def count(self)
    def save(self, path)

Import

from txtai.ann import ANNFactory

I/O Contract

Inputs

Name Type Required Description
config dict Yes ANN configuration dictionary containing backend settings
config["backend"] str Yes Must be set to "annoy" to select this backend
config["dimensions"] int Yes Dimensionality of the embedding vectors
ntrees int No Number of trees in the forest (default: 10)
searchk int No Search parameter controlling accuracy vs speed (default: -1, which uses ntrees * limit)

Outputs

Name Type Description
search() returns list List of lists of (id, score) tuples, one list per query
count() returns int Number of items in the Annoy index
save() side-effect file Persists the Annoy index to a binary file at the specified path

Usage Examples

from txtai import Embeddings

# Create embeddings with Annoy backend
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "backend": "annoy",
    "annoy": {
        "ntrees": 10,
        "searchk": 1000
    }
})

# Index data
embeddings.index([
    "US tops 5 million confirmed virus cases",
    "Canada's last intact ice shelf has broken up",
    "Beijing urges strong action on climate change",
    "New York battles severe winter storm"
])

# Search
results = embeddings.search("climate change effects", 2)
print(results)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment