Implementation:Neuml Txtai Reducer
| Knowledge Sources | |
|---|---|
| Domains | Dimensionality_Reduction, Embeddings |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
The Reducer class performs LSA-based dimensionality reduction on embedding vectors by removing top principal components, improving downstream similarity search quality.
Description
The Reducer class implements a dimensionality reduction technique that uses Truncated Singular Value Decomposition (TruncatedSVD) from scikit-learn to identify and subtract the dominant principal components from embedding vectors. This approach, rooted in research on improving word embedding representations, removes common variance directions that tend to encode frequency-related information rather than semantic content. The result is more discriminative embedding vectors that yield better similarity search performance.
Usage
Use the Reducer when you want to improve the quality of similarity search results by reducing the dimensionality of your embedding vectors. It is typically configured through the embeddings configuration and applied automatically during indexing and search. It is most beneficial when working with dense embedding models where the top principal components carry noise or non-semantic variance.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File: src/python/txtai/embeddings/index/reducer.py
- Lines: 1-104
Signature
class Reducer:
def __init__(self, embeddings=None, components=None):
"""
Creates a new Reducer instance.
Args:
embeddings: embeddings matrix used to fit the SVD model
components: number of principal components to remove
"""
def __call__(self, embeddings):
"""
Applies dimensionality reduction to the given embeddings.
Args:
embeddings: input embedding vectors (numpy array)
Returns:
reduced embedding vectors with top components removed
"""
def build(self, embeddings, components):
"""Fits the TruncatedSVD model on the provided embeddings."""
def load(self, path):
"""Loads a previously saved Reducer model from disk."""
def save(self, path):
"""Saves the current Reducer model to disk."""
Import
from txtai.embeddings.index import Reducer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| embeddings | numpy.ndarray | Yes (for __init__ fitting) | Embedding matrix used to fit the SVD model during construction |
| components | int | No | Number of top principal components to remove (default determined internally) |
| embeddings | numpy.ndarray | Yes (for __call__) | Embedding vectors to reduce, shape (n_samples, n_dimensions) |
Outputs
| Name | Type | Description |
|---|---|---|
| reduced | numpy.ndarray | Embedding vectors with top principal components subtracted, same shape as input |
Usage Examples
Basic Usage
import numpy as np
from txtai.embeddings.index import Reducer
# Create sample embeddings (e.g., from a transformer model)
embeddings = np.random.rand(1000, 768).astype(np.float32)
# Build a reducer that removes the top 3 principal components
reducer = Reducer(embeddings, components=3)
# Apply reduction to the same or new embeddings
reduced = reducer(embeddings)
print(f"Original shape: {embeddings.shape}, Reduced shape: {reduced.shape}")
# Shape remains the same, but top components are removed
With Embeddings Configuration
from txtai.embeddings import Embeddings
# Configure embeddings with dimensionality reduction
embeddings = Embeddings(
path="sentence-transformers/all-MiniLM-L6-v2",
dimensionality=3 # Remove top 3 principal components
)
# Index data - reduction is applied automatically
embeddings.index(["Deep learning fundamentals", "Natural language processing", "Computer vision"])
# Search - reduction is applied to query vectors automatically
results = embeddings.search("neural networks", 2)
print(results)
Save and Load
from txtai.embeddings.index import Reducer
import numpy as np
# Build and save a reducer
embeddings = np.random.rand(500, 384).astype(np.float32)
reducer = Reducer(embeddings, components=2)
reducer.save("/tmp/reducer_model")
# Load the reducer later
loaded_reducer = Reducer()
loaded_reducer.load("/tmp/reducer_model")
# Apply to new data
new_embeddings = np.random.rand(10, 384).astype(np.float32)
reduced = loaded_reducer(new_embeddings)