Principle:Neuml Txtai Model Integration

Overview

After a model has been fine-tuned or exported, it must be integrated back into the application's search and retrieval pipeline. In txtai, this means replacing the embedding model used by an Embeddings index and reindexing the existing document collection with the new model. The reindex operation regenerates all embedding vectors using the updated model while preserving the stored document content and metadata.

Model Integration

Model integration is the process of connecting a newly trained or exported model to an existing embeddings index. The fundamental challenge is that embedding vectors are model-specific -- vectors generated by one model are not comparable to vectors generated by a different model. Therefore, when the model changes, all existing vectors must be regenerated.

Why Reindexing Is Necessary

Embedding models map text to points in a high-dimensional vector space. The geometry of this space is determined entirely by the model's learned weights. Two different models will map the same text to different regions of their respective vector spaces. As a result:

Similarity scores become meaningless if the query is encoded by a new model but the index contains vectors from the old model.
Search quality degrades because the geometric relationships between documents no longer reflect the new model's understanding of semantic similarity.
Dimensionality may differ between models, making it physically impossible to compare old and new vectors.

Reindexing regenerates every vector in the index using the new model, ensuring that all vectors inhabit the same vector space and that similarity computations are valid.

Embedding Model Replacement

The Embeddings class accepts a path parameter in its configuration that specifies which model to use for generating vectors. Changing this path to point to a fine-tuned model is the primary mechanism for model replacement.

Configuration-Based Replacement

When creating a new Embeddings instance, the path parameter can be set to:

A Hugging Face model hub identifier (e.g., "sentence-transformers/all-MiniLM-L6-v2").
A local directory path containing a saved model (e.g., the output of HFTrainer).
A custom model path pointing to a fine-tuned or ONNX-exported model.

The embeddings instance automatically loads the appropriate vector model based on this configuration. When the configuration changes, the model is reloaded.

Default Model Behavior

If no model path is specified and no sparse-only configuration is active, txtai defaults to loading sentence-transformers/all-MiniLM-L6-v2. This ensures that embeddings work out of the box for common use cases while still allowing easy replacement with a custom model.

Index Reindexing

The reindex method on the Embeddings class is the mechanism for regenerating vectors after a model change. This method requires content storage to be enabled (content=True in the configuration), because the original document text must be available to generate new vectors.

Reindex Workflow

Receive new configuration -- The caller provides an updated configuration dictionary that may include a new model path, new index parameters, or both.
Preserve content settings -- The content and objects parameters from the current configuration are automatically carried forward to ensure the document database is preserved.
Reconfigure the embeddings instance -- The internal model, scoring, and query components are reloaded based on the new configuration.
Reset function references -- If the index uses custom SQL functions, they are reset to reflect the new configuration.
Regenerate vectors -- All documents are read from the database and re-encoded with the new model. The index is rebuilt from scratch using the new vectors.

Optional Transform Function

The reindex method accepts an optional function parameter that can transform the document content before re-encoding. This is useful for:

Reformatting text -- Changing how document fields are combined into the indexed text.
Adding metadata -- Enriching documents with additional information before reindexing.
Filtering -- Excluding certain documents from the reindexed collection.

When provided, this function is applied to the stream of documents read from the database, and its output is passed to the indexing pipeline.

Integration Patterns

Pattern 1: Train and Replace

The most common pattern involves training a new model and immediately using it:

Train a model using HFTrainer, which returns a (model, tokenizer) tuple.
Save the model to a local directory.
Create a new Embeddings instance with path set to the saved model directory.
Index documents with the new embeddings instance.

Pattern 2: Train, Export, and Replace

For production deployments where inference speed matters:

Train a model using HFTrainer.
Export the model to ONNX using HFOnnx.
Create a new Embeddings instance pointing to the ONNX model.
Index documents.

Pattern 3: In-Place Reindex

For existing indexes that need a model upgrade:

Load an existing Embeddings index that has content storage enabled.
Call reindex() with a new configuration specifying the new model path.
The existing documents are automatically re-encoded and the index is rebuilt.

Architectural Considerations

Content Storage Requirement

Reindexing is only possible when the original document content is stored alongside the index. Without content storage, the embeddings index only contains vectors and ID mappings -- the original text is not available for re-encoding. This is why content=True is a prerequisite for the reindex method.

Component Preservation

During reindexing, the document database is preserved while the vector index (ANN), scoring index, subindexes, and graph are rebuilt from scratch. This ensures that document content and metadata are never lost during a model change.

Sparse Index Interaction

If the embeddings configuration includes a sparse scoring index (e.g., BM25), it is also rebuilt during reindexing. The sparse index operates independently of the dense vector model, but it shares the same document collection and must be consistent with the dense index.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment