Implementation:Neuml Txtai NumPy ANN

Knowledge Sources	Neuml_Txtai
Domains	Vector_Search, Exact_Search
Last Updated	2026-02-09 17:00 GMT

Overview

NumPy is an exact nearest neighbor search index backed by NumPy arrays, supporting cosine similarity via dot product and optional scalar quantization with hamming distance scoring.

Description

The NumPy class inherits from ANN and implements brute-force exact nearest neighbor search using NumPy's array operations. It stores all embeddings in a single NumPy array and computes cosine similarity via dot product on normalized vectors. When scalar quantization is enabled (via the quantize config parameter), it performs hamming distance scoring by XOR-ing integer vectors and counting differing bits. The class supports safetensors and NumPy binary formats for persistence, and serves as the base class for the GPU-accelerated Torch backend.

Usage

Use NumPy when you need exact nearest neighbor search without approximate index overhead, when working with small-to-medium datasets where brute-force search is acceptable, or when GPU acceleration is not available. It is also the fallback ANN backend when no specialized libraries (FAISS, Annoy, etc.) are installed.

Code Reference

Source Location

Repository: Neuml_Txtai
File: src/python/txtai/ann/dense/numpy.py
Lines: 1-202

Signature

class NumPy(ANN):
    def __init__(self, config):
        """
        Creates a new NumPy ANN index.

        Args:
            config: index configuration dict
        """

Import

from txtai.ann.dense import NumPy

Key Methods

Method	Description
`load(path)`	Loads embeddings from a file (safetensors or NumPy binary format). Includes backward compatibility for pickled data.
`index(embeddings)`	Creates the index from a NumPy embeddings array and records metadata including the id offset.
`append(embeddings)`	Appends new embeddings to the existing index array via concatenation.
`delete(ids)`	Soft-deletes entries by zeroing out the rows at the specified ids.
`search(queries, limit)`	Searches for the top-k nearest neighbors. Uses dot product for cosine similarity or hamming scoring for quantized vectors.
`count()`	Returns the count of non-zero (non-deleted) rows in the index.
`save(path)`	Saves the index to disk in safetensors or NumPy binary format.
`hammingscore(queries)`	Computes hamming distance scores for quantized integer vectors. Score = 1.0 - (hamming distance / total bits), bounded between 0 and 1.

I/O Contract

Inputs

Name	Type	Required	Description
config	dict	Yes	Index configuration. Key options include `quantize` (integer for scalar quantization bit width, enabling hamming scoring), `dimensions` (embedding dimensions), and backend-specific settings like `safetensors`.
embeddings	numpy.ndarray	Yes (for index/append)	2D NumPy array of shape `(n, dimensions)` containing normalized embedding vectors.
queries	numpy.ndarray	Yes (for search)	2D NumPy array of shape `(q, dimensions)` containing normalized query vectors.
limit	int	Yes (for search)	Maximum number of nearest neighbors to return per query.
ids	list of int	Yes (for delete)	List of row indices to soft-delete by zeroing.

Outputs

Name	Type	Description
search results	list of list of tuple	For each query, a list of `(id, score)` tuples sorted by descending similarity score.
count	int	Number of non-deleted (non-zero) rows in the index.

Usage Examples

Basic Usage

import numpy as np
from txtai.ann.dense.numpy import NumPy

# Create configuration
config = {"dimensions": 128, "offset": 0}

# Build index
ann = NumPy(config)

# Generate random normalized embeddings
embeddings = np.random.rand(1000, 128).astype(np.float32)
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)

# Index embeddings
ann.index(embeddings)

# Search with a query vector
query = np.random.rand(1, 128).astype(np.float32)
query = query / np.linalg.norm(query, axis=1, keepdims=True)

results = ann.search(query, limit=5)
for uid, score in results[0]:
    print(f"ID: {uid}, Score: {score:.4f}")

# Check count
print(f"Total indexed: {ann.count()}")

Delete and Append

import numpy as np
from txtai.ann.dense.numpy import NumPy

config = {"dimensions": 64, "offset": 0}
ann = NumPy(config)

# Initial index
data = np.random.rand(100, 64).astype(np.float32)
data = data / np.linalg.norm(data, axis=1, keepdims=True)
ann.index(data)
print(f"Count after index: {ann.count()}")

# Delete some entries (soft delete by zeroing)
ann.delete([0, 1, 2])
print(f"Count after delete: {ann.count()}")

# Append new embeddings
new_data = np.random.rand(10, 64).astype(np.float32)
new_data = new_data / np.linalg.norm(new_data, axis=1, keepdims=True)
ann.append(new_data)
print(f"Count after append: {ann.count()}")

Related Pages

Principle:Neuml_Txtai_ANN_Backend_Architecture

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment