Implementation:Neuml Txtai HNSW ANN
| Knowledge Sources | |
|---|---|
| Domains | Vector_Search, ANN |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete ANN backend for approximate nearest neighbor search using the hnswlib library with Hierarchical Navigable Small World graphs, provided by txtai.
Description
HNSW is an ANN implementation that builds approximate nearest neighbor indexes using the hnswlib library. It creates a Hierarchical Navigable Small World (HNSW) graph for fast similarity search using inner product distance (equivalent to cosine similarity on normalized vectors). The index supports dynamic append and delete operations -- deletes are handled via hnswlib's mark_deleted method, and appends resize the index before adding new items. Distances returned by hnswlib are converted to similarity scores as 1 - distance.
Usage
Use the HNSW backend when you need a high-performance, in-memory ANN index that supports dynamic updates. Select this backend by setting the ANN backend configuration to "hnsw". Requires the hnswlib Python package, installed via the txtai "ann" extra. Key tuning parameters include efconstruction, m, efsearch, and randomseed.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File: src/python/txtai/ann/dense/hnsw.py
- Lines: 1-105
Signature
class HNSW(ANN):
"""Builds an ANN index using the hnswlib library."""
def __init__(self, config)
def load(self, path)
def index(self, embeddings)
def append(self, embeddings)
def delete(self, ids)
def search(self, queries, limit)
def count(self)
def save(self, path)
Import
from txtai.ann import ANNFactory
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | dict | Yes | ANN configuration dictionary containing backend settings |
| config["backend"] | str | Yes | Must be set to "hnsw" to select this backend |
| config["dimensions"] | int | Yes | Dimensionality of the embedding vectors |
| efconstruction | int | No | Controls index build quality (default: 200) |
| m | int | No | Number of bi-directional links per element (default: 16) |
| efsearch | int | No | Search-time ef parameter controlling accuracy vs speed (optional) |
| randomseed | int | No | Random seed for reproducibility (default: 100) |
Outputs
| Name | Type | Description |
|---|---|---|
| search() returns | list | List of lists of (id, score) tuples where score = 1 - distance |
| count() returns | int | Number of elements minus deleted count |
| save() side-effect | file | Persists hnswlib index to a binary file at the specified path |
Usage Examples
from txtai import Embeddings
# Create embeddings with HNSW backend
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"backend": "hnsw",
"hnsw": {
"efconstruction": 200,
"m": 16,
"efsearch": 100
}
})
# Index data
embeddings.index([
"US tops 5 million confirmed virus cases",
"Canada's last intact ice shelf has broken up",
"Beijing urges strong action on climate change",
"New York battles severe winter storm"
])
# Search
results = embeddings.search("climate change effects", 2)
print(results)
# Append new data to an existing HNSW index
embeddings.upsert([
("4", "Scientists discover high pollution levels in Arctic", None)
])