Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai MUVERA Pooling

From Leeroopedia


Knowledge Sources
Domains Multi-Vector Retrieval, Dimensionality Reduction, Fixed-Dimensional Encoding
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for reducing multi-vector embeddings to single fixed-dimensional vectors using the MUVERA algorithm provided by txtai.

Description

The Muvera class implements the MUVERA (Multi-Vector Retrieval via Fixed Dimensional Encodings) algorithm, which converts late interaction multi-vector outputs (such as those from ColBERT) into single fixed-dimensional vectors suitable for standard approximate nearest neighbor (ANN) indexes. The algorithm works by: (1) performing simhash-based partitioning of token vectors using random projections, (2) optionally reducing dimensionality via AMS sketch projections, and (3) aggregating vectors within each partition across multiple repetitions to produce the final fixed-size encoding. The output dimension is calculated as repetitions * 2^hashes * projection (default: 20 * 32 * 16 = 10,240). For "data" category inputs, partition aggregates are averaged; for "query" category inputs, raw sums are used. The implementation is based on the paper "MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings" (arXiv:2405.19504) and is a Python port derived from Google's C++ implementation.

Usage

Use Muvera when you need to convert multi-vector (late interaction) embeddings to single fixed vectors for efficient retrieval with standard ANN indexes. It is typically used as a component within LatePooling, configured via the muvera key in modelargs. Direct usage is appropriate when applying the MUVERA encoding to pre-computed multi-vector embeddings outside the pooling pipeline.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/models/pooling/muvera.py

Signature

class Muvera:
    def __init__(self, repetitions=20, hashes=5, projection=16, seed=42)
    def __call__(self, data, category)
    def random(self, dimension, projection, seed)
    def reducer(self, dimension, projection, seed)

Import

from txtai.models.pooling.muvera import Muvera

I/O Contract

Inputs

Name Type Required Description
repetitions int No Number of encoding repetitions. Each repetition produces 2^hashes * projection dimensions. Default is 20.
hashes int No Number of simhash bits, creating 2^hashes partitions. Default is 5 (32 partitions).
projection int or None No Dimensionality reduction target per partition. Set to None for identity projection (no reduction). Default is 16.
seed int No Random seed for reproducible projections. Default is 42.
data numpy.ndarray Yes (for __call__) 3D array of shape (num_documents, max_tokens, embedding_dim) containing multi-vector embeddings.
category str Yes (for __call__) Either "query" (raw sums per partition) or "data" (averaged sums per partition).

Outputs

Name Type Description
__call__() numpy.ndarray 2D array of shape (num_documents, output_dim) where output_dim = repetitions * 2^hashes * projection. Default output dimensions: 20 * 32 * 16 = 10,240.
random() numpy.ndarray Random Gaussian matrix of shape (dimension, projection) for simhash computation.
reducer() numpy.ndarray Sparse AMS sketch matrix of shape (dimension, projection) for dimensionality reduction, with exactly one non-zero value per row.

Usage Examples

import numpy as np
from txtai.models.pooling.muvera import Muvera

# Create a MUVERA encoder with default parameters
encoder = Muvera(repetitions=20, hashes=5, projection=16, seed=42)
# Output dimensions: 20 * 2^5 * 16 = 10,240

# Simulate multi-vector embeddings (e.g., from ColBERT)
# 2 documents, 32 tokens each, 128-dimensional embeddings
data = np.random.randn(2, 32, 128).astype(np.float32)

# Encode documents (uses averaging per partition)
doc_vectors = encoder(data, category="data")
# doc_vectors.shape: (2, 10240)

# Encode queries (uses raw sums per partition)
query_vectors = encoder(data, category="query")
# query_vectors.shape: (2, 10240)

# Use identity projection (no dimensionality reduction)
encoder_full = Muvera(repetitions=10, hashes=4, projection=None, seed=42)
# Output dimensions: 10 * 2^4 * 128 = 20,480 (uses full embedding dimension)
full_vectors = encoder_full(data, category="data")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment