Implementation:Neuml Txtai MUVERA Pooling
| Knowledge Sources | |
|---|---|
| Domains | Multi-Vector Retrieval, Dimensionality Reduction, Fixed-Dimensional Encoding |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete tool for reducing multi-vector embeddings to single fixed-dimensional vectors using the MUVERA algorithm provided by txtai.
Description
The Muvera class implements the MUVERA (Multi-Vector Retrieval via Fixed Dimensional Encodings) algorithm, which converts late interaction multi-vector outputs (such as those from ColBERT) into single fixed-dimensional vectors suitable for standard approximate nearest neighbor (ANN) indexes. The algorithm works by: (1) performing simhash-based partitioning of token vectors using random projections, (2) optionally reducing dimensionality via AMS sketch projections, and (3) aggregating vectors within each partition across multiple repetitions to produce the final fixed-size encoding. The output dimension is calculated as repetitions * 2^hashes * projection (default: 20 * 32 * 16 = 10,240). For "data" category inputs, partition aggregates are averaged; for "query" category inputs, raw sums are used. The implementation is based on the paper "MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings" (arXiv:2405.19504) and is a Python port derived from Google's C++ implementation.
Usage
Use Muvera when you need to convert multi-vector (late interaction) embeddings to single fixed vectors for efficient retrieval with standard ANN indexes. It is typically used as a component within LatePooling, configured via the muvera key in modelargs. Direct usage is appropriate when applying the MUVERA encoding to pre-computed multi-vector embeddings outside the pooling pipeline.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File:
src/python/txtai/models/pooling/muvera.py
Signature
class Muvera:
def __init__(self, repetitions=20, hashes=5, projection=16, seed=42)
def __call__(self, data, category)
def random(self, dimension, projection, seed)
def reducer(self, dimension, projection, seed)
Import
from txtai.models.pooling.muvera import Muvera
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| repetitions | int | No | Number of encoding repetitions. Each repetition produces 2^hashes * projection dimensions. Default is 20.
|
| hashes | int | No | Number of simhash bits, creating 2^hashes partitions. Default is 5 (32 partitions).
|
| projection | int or None | No | Dimensionality reduction target per partition. Set to None for identity projection (no reduction). Default is 16. |
| seed | int | No | Random seed for reproducible projections. Default is 42. |
| data | numpy.ndarray | Yes (for __call__) | 3D array of shape (num_documents, max_tokens, embedding_dim) containing multi-vector embeddings. |
| category | str | Yes (for __call__) | Either "query" (raw sums per partition) or "data" (averaged sums per partition). |
Outputs
| Name | Type | Description |
|---|---|---|
| __call__() | numpy.ndarray | 2D array of shape (num_documents, output_dim) where output_dim = repetitions * 2^hashes * projection. Default output dimensions: 20 * 32 * 16 = 10,240.
|
| random() | numpy.ndarray | Random Gaussian matrix of shape (dimension, projection) for simhash computation. |
| reducer() | numpy.ndarray | Sparse AMS sketch matrix of shape (dimension, projection) for dimensionality reduction, with exactly one non-zero value per row. |
Usage Examples
import numpy as np
from txtai.models.pooling.muvera import Muvera
# Create a MUVERA encoder with default parameters
encoder = Muvera(repetitions=20, hashes=5, projection=16, seed=42)
# Output dimensions: 20 * 2^5 * 16 = 10,240
# Simulate multi-vector embeddings (e.g., from ColBERT)
# 2 documents, 32 tokens each, 128-dimensional embeddings
data = np.random.randn(2, 32, 128).astype(np.float32)
# Encode documents (uses averaging per partition)
doc_vectors = encoder(data, category="data")
# doc_vectors.shape: (2, 10240)
# Encode queries (uses raw sums per partition)
query_vectors = encoder(data, category="query")
# query_vectors.shape: (2, 10240)
# Use identity projection (no dimensionality reduction)
encoder_full = Muvera(repetitions=10, hashes=4, projection=None, seed=42)
# Output dimensions: 10 * 2^4 * 128 = 20,480 (uses full embedding dimension)
full_vectors = encoder_full(data, category="data")