Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai HNSW ANN

From Leeroopedia


Knowledge Sources
Domains Vector_Search, ANN
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete ANN backend for approximate nearest neighbor search using the hnswlib library with Hierarchical Navigable Small World graphs, provided by txtai.

Description

HNSW is an ANN implementation that builds approximate nearest neighbor indexes using the hnswlib library. It creates a Hierarchical Navigable Small World (HNSW) graph for fast similarity search using inner product distance (equivalent to cosine similarity on normalized vectors). The index supports dynamic append and delete operations -- deletes are handled via hnswlib's mark_deleted method, and appends resize the index before adding new items. Distances returned by hnswlib are converted to similarity scores as 1 - distance.

Usage

Use the HNSW backend when you need a high-performance, in-memory ANN index that supports dynamic updates. Select this backend by setting the ANN backend configuration to "hnsw". Requires the hnswlib Python package, installed via the txtai "ann" extra. Key tuning parameters include efconstruction, m, efsearch, and randomseed.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/ann/dense/hnsw.py
  • Lines: 1-105

Signature

class HNSW(ANN):
    """Builds an ANN index using the hnswlib library."""

    def __init__(self, config)
    def load(self, path)
    def index(self, embeddings)
    def append(self, embeddings)
    def delete(self, ids)
    def search(self, queries, limit)
    def count(self)
    def save(self, path)

Import

from txtai.ann import ANNFactory

I/O Contract

Inputs

Name Type Required Description
config dict Yes ANN configuration dictionary containing backend settings
config["backend"] str Yes Must be set to "hnsw" to select this backend
config["dimensions"] int Yes Dimensionality of the embedding vectors
efconstruction int No Controls index build quality (default: 200)
m int No Number of bi-directional links per element (default: 16)
efsearch int No Search-time ef parameter controlling accuracy vs speed (optional)
randomseed int No Random seed for reproducibility (default: 100)

Outputs

Name Type Description
search() returns list List of lists of (id, score) tuples where score = 1 - distance
count() returns int Number of elements minus deleted count
save() side-effect file Persists hnswlib index to a binary file at the specified path

Usage Examples

from txtai import Embeddings

# Create embeddings with HNSW backend
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "backend": "hnsw",
    "hnsw": {
        "efconstruction": 200,
        "m": 16,
        "efsearch": 100
    }
})

# Index data
embeddings.index([
    "US tops 5 million confirmed virus cases",
    "Canada's last intact ice shelf has broken up",
    "Beijing urges strong action on climate change",
    "New York battles severe winter storm"
])

# Search
results = embeddings.search("climate change effects", 2)
print(results)
# Append new data to an existing HNSW index
embeddings.upsert([
    ("4", "Scientists discover high pollution levels in Arctic", None)
])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment