Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai NumPy ANN

From Leeroopedia


Knowledge Sources
Domains Vector_Search, Exact_Search
Last Updated 2026-02-09 17:00 GMT

Overview

NumPy is an exact nearest neighbor search index backed by NumPy arrays, supporting cosine similarity via dot product and optional scalar quantization with hamming distance scoring.

Description

The NumPy class inherits from ANN and implements brute-force exact nearest neighbor search using NumPy's array operations. It stores all embeddings in a single NumPy array and computes cosine similarity via dot product on normalized vectors. When scalar quantization is enabled (via the quantize config parameter), it performs hamming distance scoring by XOR-ing integer vectors and counting differing bits. The class supports safetensors and NumPy binary formats for persistence, and serves as the base class for the GPU-accelerated Torch backend.

Usage

Use NumPy when you need exact nearest neighbor search without approximate index overhead, when working with small-to-medium datasets where brute-force search is acceptable, or when GPU acceleration is not available. It is also the fallback ANN backend when no specialized libraries (FAISS, Annoy, etc.) are installed.

Code Reference

Source Location

Signature

class NumPy(ANN):
    def __init__(self, config):
        """
        Creates a new NumPy ANN index.

        Args:
            config: index configuration dict
        """

Import

from txtai.ann.dense import NumPy

Key Methods

Method Description
load(path) Loads embeddings from a file (safetensors or NumPy binary format). Includes backward compatibility for pickled data.
index(embeddings) Creates the index from a NumPy embeddings array and records metadata including the id offset.
append(embeddings) Appends new embeddings to the existing index array via concatenation.
delete(ids) Soft-deletes entries by zeroing out the rows at the specified ids.
search(queries, limit) Searches for the top-k nearest neighbors. Uses dot product for cosine similarity or hamming scoring for quantized vectors.
count() Returns the count of non-zero (non-deleted) rows in the index.
save(path) Saves the index to disk in safetensors or NumPy binary format.
hammingscore(queries) Computes hamming distance scores for quantized integer vectors. Score = 1.0 - (hamming distance / total bits), bounded between 0 and 1.

I/O Contract

Inputs

Name Type Required Description
config dict Yes Index configuration. Key options include quantize (integer for scalar quantization bit width, enabling hamming scoring), dimensions (embedding dimensions), and backend-specific settings like safetensors.
embeddings numpy.ndarray Yes (for index/append) 2D NumPy array of shape (n, dimensions) containing normalized embedding vectors.
queries numpy.ndarray Yes (for search) 2D NumPy array of shape (q, dimensions) containing normalized query vectors.
limit int Yes (for search) Maximum number of nearest neighbors to return per query.
ids list of int Yes (for delete) List of row indices to soft-delete by zeroing.

Outputs

Name Type Description
search results list of list of tuple For each query, a list of (id, score) tuples sorted by descending similarity score.
count int Number of non-deleted (non-zero) rows in the index.

Usage Examples

Basic Usage

import numpy as np
from txtai.ann.dense.numpy import NumPy

# Create configuration
config = {"dimensions": 128, "offset": 0}

# Build index
ann = NumPy(config)

# Generate random normalized embeddings
embeddings = np.random.rand(1000, 128).astype(np.float32)
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)

# Index embeddings
ann.index(embeddings)

# Search with a query vector
query = np.random.rand(1, 128).astype(np.float32)
query = query / np.linalg.norm(query, axis=1, keepdims=True)

results = ann.search(query, limit=5)
for uid, score in results[0]:
    print(f"ID: {uid}, Score: {score:.4f}")

# Check count
print(f"Total indexed: {ann.count()}")

Delete and Append

import numpy as np
from txtai.ann.dense.numpy import NumPy

config = {"dimensions": 64, "offset": 0}
ann = NumPy(config)

# Initial index
data = np.random.rand(100, 64).astype(np.float32)
data = data / np.linalg.norm(data, axis=1, keepdims=True)
ann.index(data)
print(f"Count after index: {ann.count()}")

# Delete some entries (soft delete by zeroing)
ann.delete([0, 1, 2])
print(f"Count after delete: {ann.count()}")

# Append new embeddings
new_data = np.random.rand(10, 64).astype(np.float32)
new_data = new_data / np.linalg.norm(new_data, axis=1, keepdims=True)
ann.append(new_data)
print(f"Count after append: {ann.count()}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment