Implementation:Neuml Txtai NumPy ANN
| Knowledge Sources | |
|---|---|
| Domains | Vector_Search, Exact_Search |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
NumPy is an exact nearest neighbor search index backed by NumPy arrays, supporting cosine similarity via dot product and optional scalar quantization with hamming distance scoring.
Description
The NumPy class inherits from ANN and implements brute-force exact nearest neighbor search using NumPy's array operations. It stores all embeddings in a single NumPy array and computes cosine similarity via dot product on normalized vectors. When scalar quantization is enabled (via the quantize config parameter), it performs hamming distance scoring by XOR-ing integer vectors and counting differing bits. The class supports safetensors and NumPy binary formats for persistence, and serves as the base class for the GPU-accelerated Torch backend.
Usage
Use NumPy when you need exact nearest neighbor search without approximate index overhead, when working with small-to-medium datasets where brute-force search is acceptable, or when GPU acceleration is not available. It is also the fallback ANN backend when no specialized libraries (FAISS, Annoy, etc.) are installed.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File: src/python/txtai/ann/dense/numpy.py
- Lines: 1-202
Signature
class NumPy(ANN):
def __init__(self, config):
"""
Creates a new NumPy ANN index.
Args:
config: index configuration dict
"""
Import
from txtai.ann.dense import NumPy
Key Methods
| Method | Description |
|---|---|
load(path) |
Loads embeddings from a file (safetensors or NumPy binary format). Includes backward compatibility for pickled data. |
index(embeddings) |
Creates the index from a NumPy embeddings array and records metadata including the id offset. |
append(embeddings) |
Appends new embeddings to the existing index array via concatenation. |
delete(ids) |
Soft-deletes entries by zeroing out the rows at the specified ids. |
search(queries, limit) |
Searches for the top-k nearest neighbors. Uses dot product for cosine similarity or hamming scoring for quantized vectors. |
count() |
Returns the count of non-zero (non-deleted) rows in the index. |
save(path) |
Saves the index to disk in safetensors or NumPy binary format. |
hammingscore(queries) |
Computes hamming distance scores for quantized integer vectors. Score = 1.0 - (hamming distance / total bits), bounded between 0 and 1. |
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | dict | Yes | Index configuration. Key options include quantize (integer for scalar quantization bit width, enabling hamming scoring), dimensions (embedding dimensions), and backend-specific settings like safetensors.
|
| embeddings | numpy.ndarray | Yes (for index/append) | 2D NumPy array of shape (n, dimensions) containing normalized embedding vectors.
|
| queries | numpy.ndarray | Yes (for search) | 2D NumPy array of shape (q, dimensions) containing normalized query vectors.
|
| limit | int | Yes (for search) | Maximum number of nearest neighbors to return per query. |
| ids | list of int | Yes (for delete) | List of row indices to soft-delete by zeroing. |
Outputs
| Name | Type | Description |
|---|---|---|
| search results | list of list of tuple | For each query, a list of (id, score) tuples sorted by descending similarity score.
|
| count | int | Number of non-deleted (non-zero) rows in the index. |
Usage Examples
Basic Usage
import numpy as np
from txtai.ann.dense.numpy import NumPy
# Create configuration
config = {"dimensions": 128, "offset": 0}
# Build index
ann = NumPy(config)
# Generate random normalized embeddings
embeddings = np.random.rand(1000, 128).astype(np.float32)
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
# Index embeddings
ann.index(embeddings)
# Search with a query vector
query = np.random.rand(1, 128).astype(np.float32)
query = query / np.linalg.norm(query, axis=1, keepdims=True)
results = ann.search(query, limit=5)
for uid, score in results[0]:
print(f"ID: {uid}, Score: {score:.4f}")
# Check count
print(f"Total indexed: {ann.count()}")
Delete and Append
import numpy as np
from txtai.ann.dense.numpy import NumPy
config = {"dimensions": 64, "offset": 0}
ann = NumPy(config)
# Initial index
data = np.random.rand(100, 64).astype(np.float32)
data = data / np.linalg.norm(data, axis=1, keepdims=True)
ann.index(data)
print(f"Count after index: {ann.count()}")
# Delete some entries (soft delete by zeroing)
ann.delete([0, 1, 2])
print(f"Count after delete: {ann.count()}")
# Append new embeddings
new_data = np.random.rand(10, 64).astype(np.float32)
new_data = new_data / np.linalg.norm(new_data, axis=1, keepdims=True)
ann.append(new_data)
print(f"Count after append: {ann.count()}")