Implementation:Neuml Txtai MUVERA Pooling

Knowledge Sources	Neuml_Txtai
Domains	Multi-Vector Retrieval, Dimensionality Reduction, Fixed-Dimensional Encoding
Last Updated	2026-02-10 01:00 GMT

Overview

Concrete tool for reducing multi-vector embeddings to single fixed-dimensional vectors using the MUVERA algorithm provided by txtai.

Description

The Muvera class implements the MUVERA (Multi-Vector Retrieval via Fixed Dimensional Encodings) algorithm, which converts late interaction multi-vector outputs (such as those from ColBERT) into single fixed-dimensional vectors suitable for standard approximate nearest neighbor (ANN) indexes. The algorithm works by: (1) performing simhash-based partitioning of token vectors using random projections, (2) optionally reducing dimensionality via AMS sketch projections, and (3) aggregating vectors within each partition across multiple repetitions to produce the final fixed-size encoding. The output dimension is calculated as repetitions * 2^hashes * projection (default: 20 * 32 * 16 = 10,240). For "data" category inputs, partition aggregates are averaged; for "query" category inputs, raw sums are used. The implementation is based on the paper "MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings" (arXiv:2405.19504) and is a Python port derived from Google's C++ implementation.

Usage

Use Muvera when you need to convert multi-vector (late interaction) embeddings to single fixed vectors for efficient retrieval with standard ANN indexes. It is typically used as a component within LatePooling, configured via the muvera key in modelargs. Direct usage is appropriate when applying the MUVERA encoding to pre-computed multi-vector embeddings outside the pooling pipeline.

Code Reference

Source Location

Repository: Neuml_Txtai
File: src/python/txtai/models/pooling/muvera.py

Signature

class Muvera:
    def __init__(self, repetitions=20, hashes=5, projection=16, seed=42)
    def __call__(self, data, category)
    def random(self, dimension, projection, seed)
    def reducer(self, dimension, projection, seed)

Import

from txtai.models.pooling.muvera import Muvera

I/O Contract

Inputs

Name	Type	Required	Description
repetitions	int	No	Number of encoding repetitions. Each repetition produces `2^hashes * projection` dimensions. Default is 20.
hashes	int	No	Number of simhash bits, creating `2^hashes` partitions. Default is 5 (32 partitions).
projection	int or None	No	Dimensionality reduction target per partition. Set to None for identity projection (no reduction). Default is 16.
seed	int	No	Random seed for reproducible projections. Default is 42.
data	numpy.ndarray	Yes (for __call__)	3D array of shape (num_documents, max_tokens, embedding_dim) containing multi-vector embeddings.
category	str	Yes (for __call__)	Either "query" (raw sums per partition) or "data" (averaged sums per partition).

Outputs

Name	Type	Description
__call__()	numpy.ndarray	2D array of shape (num_documents, output_dim) where `output_dim = repetitions * 2^hashes * projection`. Default output dimensions: 20 * 32 * 16 = 10,240.
random()	numpy.ndarray	Random Gaussian matrix of shape (dimension, projection) for simhash computation.
reducer()	numpy.ndarray	Sparse AMS sketch matrix of shape (dimension, projection) for dimensionality reduction, with exactly one non-zero value per row.

Usage Examples

import numpy as np
from txtai.models.pooling.muvera import Muvera

# Create a MUVERA encoder with default parameters
encoder = Muvera(repetitions=20, hashes=5, projection=16, seed=42)
# Output dimensions: 20 * 2^5 * 16 = 10,240

# Simulate multi-vector embeddings (e.g., from ColBERT)
# 2 documents, 32 tokens each, 128-dimensional embeddings
data = np.random.randn(2, 32, 128).astype(np.float32)

# Encode documents (uses averaging per partition)
doc_vectors = encoder(data, category="data")
# doc_vectors.shape: (2, 10240)

# Encode queries (uses raw sums per partition)
query_vectors = encoder(data, category="query")
# query_vectors.shape: (2, 10240)

# Use identity projection (no dimensionality reduction)
encoder_full = Muvera(repetitions=10, hashes=4, projection=None, seed=42)
# Output dimensions: 10 * 2^4 * 128 = 20,480 (uses full embedding dimension)
full_vectors = encoder_full(data, category="data")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment