Implementation:Neuml Txtai Reducer

Knowledge Sources	Neuml_Txtai
Domains	Dimensionality_Reduction, Embeddings
Last Updated	2026-02-09 17:00 GMT

Overview

The Reducer class performs LSA-based dimensionality reduction on embedding vectors by removing top principal components, improving downstream similarity search quality.

Description

The Reducer class implements a dimensionality reduction technique that uses Truncated Singular Value Decomposition (TruncatedSVD) from scikit-learn to identify and subtract the dominant principal components from embedding vectors. This approach, rooted in research on improving word embedding representations, removes common variance directions that tend to encode frequency-related information rather than semantic content. The result is more discriminative embedding vectors that yield better similarity search performance.

Usage

Use the Reducer when you want to improve the quality of similarity search results by reducing the dimensionality of your embedding vectors. It is typically configured through the embeddings configuration and applied automatically during indexing and search. It is most beneficial when working with dense embedding models where the top principal components carry noise or non-semantic variance.

Code Reference

Source Location

Repository: Neuml_Txtai
File: src/python/txtai/embeddings/index/reducer.py
Lines: 1-104

Signature

class Reducer:
    def __init__(self, embeddings=None, components=None):
        """
        Creates a new Reducer instance.

        Args:
            embeddings: embeddings matrix used to fit the SVD model
            components: number of principal components to remove
        """

    def __call__(self, embeddings):
        """
        Applies dimensionality reduction to the given embeddings.

        Args:
            embeddings: input embedding vectors (numpy array)

        Returns:
            reduced embedding vectors with top components removed
        """

    def build(self, embeddings, components):
        """Fits the TruncatedSVD model on the provided embeddings."""

    def load(self, path):
        """Loads a previously saved Reducer model from disk."""

    def save(self, path):
        """Saves the current Reducer model to disk."""

Import

from txtai.embeddings.index import Reducer

I/O Contract

Inputs

Name	Type	Required	Description
embeddings	numpy.ndarray	Yes (for __init__ fitting)	Embedding matrix used to fit the SVD model during construction
components	int	No	Number of top principal components to remove (default determined internally)
embeddings	numpy.ndarray	Yes (for __call__)	Embedding vectors to reduce, shape (n_samples, n_dimensions)

Outputs

Name	Type	Description
reduced	numpy.ndarray	Embedding vectors with top principal components subtracted, same shape as input

Usage Examples

Basic Usage

import numpy as np
from txtai.embeddings.index import Reducer

# Create sample embeddings (e.g., from a transformer model)
embeddings = np.random.rand(1000, 768).astype(np.float32)

# Build a reducer that removes the top 3 principal components
reducer = Reducer(embeddings, components=3)

# Apply reduction to the same or new embeddings
reduced = reducer(embeddings)
print(f"Original shape: {embeddings.shape}, Reduced shape: {reduced.shape}")
# Shape remains the same, but top components are removed

With Embeddings Configuration

from txtai.embeddings import Embeddings

# Configure embeddings with dimensionality reduction
embeddings = Embeddings(
    path="sentence-transformers/all-MiniLM-L6-v2",
    dimensionality=3  # Remove top 3 principal components
)

# Index data - reduction is applied automatically
embeddings.index(["Deep learning fundamentals", "Natural language processing", "Computer vision"])

# Search - reduction is applied to query vectors automatically
results = embeddings.search("neural networks", 2)
print(results)

Save and Load

from txtai.embeddings.index import Reducer
import numpy as np

# Build and save a reducer
embeddings = np.random.rand(500, 384).astype(np.float32)
reducer = Reducer(embeddings, components=2)
reducer.save("/tmp/reducer_model")

# Load the reducer later
loaded_reducer = Reducer()
loaded_reducer.load("/tmp/reducer_model")

# Apply to new data
new_embeddings = np.random.rand(10, 384).astype(np.float32)
reduced = loaded_reducer(new_embeddings)

Related Pages

Principle:Neuml_Txtai_Dimensionality_Reduction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment