Implementation:Neuml Txtai Subindex Manager
| Knowledge Sources | |
|---|---|
| Domains | Embeddings, Index Management, Multi-Index |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete tool for managing a collection of subindexes within an embeddings instance provided by txtai.
Description
The Indexes class manages multiple subindexes that compose a parent embeddings instance. Each subindex is itself a full embeddings instance, enabling different vector models, scoring methods, or configurations to coexist within a single logical index.
Key features:
- Dictionary-like access: Subindexes can be accessed by name using
__contains__,__getitem__, or__getattr__, allowing both bracket notation (indexes["name"]) and attribute-style access (indexes.name). - Document filtering: During insert, the class filters documents that have valid text or object fields set, or includes all documents when top-level indexing is disabled (no model or scoring configured). Each document is assigned an index ID matching its position in the parent index.
- Document streaming: Uses the Documents class to buffer inserted documents to disk for deferred indexing.
- Lifecycle management: Provides index, upsert, delete, load, save, and close methods that delegate to each subindex in turn. The index and upsert methods also clean up the document stream after processing.
- Checkpoint support: Supports checkpoint directories for indexing restart. Each subindex gets a subdirectory within the checkpoint path.
- Model lookup: The findmodel method locates a vector model across subindexes, optionally filtered by index name.
Usage
Use Indexes when you need to configure multiple subindexes within a single txtai embeddings instance. This is useful for hybrid search (combining dense and sparse indexes), multi-model configurations, or organizing data into logical partitions that are searched together.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File:
src/python/txtai/embeddings/index/indexes.py
Signature
class Indexes:
def __init__(self, embeddings, indexes)
def __contains__(self, name) -> bool
def __getitem__(self, name) -> Embeddings
def __getattr__(self, name) -> Embeddings
def default(self) -> str
def findmodel(self, index=None) -> Vectors
def insert(self, documents, index=None, checkpoint=None)
def delete(self, ids)
def index(self)
def upsert(self)
def load(self, path)
def save(self, path)
def close(self)
Import
from txtai.embeddings.index.indexes import Indexes
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| embeddings | Embeddings | Yes | Parent embeddings instance. Used to determine text/object column names and whether top-level indexing is enabled. |
| indexes | dict | Yes | Dictionary mapping index names (str) to embeddings instances (Embeddings). Each is a fully configured subindex. |
| documents | list[tuple] | Yes (insert) | List of (id, document, tags) tuples. Documents are filtered based on text/object field availability. |
| index | int | Yes (insert) | Starting index ID offset, matching the parent index position. |
| checkpoint | str | No | Optional checkpoint directory path for indexing restart support. |
| ids | list | Yes (delete) | List of document IDs to remove from all subindexes. |
| path | str | Yes (load/save) | Directory path for loading/saving subindexes. Each subindex is stored in a subdirectory named after its key. |
Outputs
| Name | Type | Description |
|---|---|---|
| contains | bool | True if the named index exists in this collection. |
| index | Embeddings | Retrieved subindex embeddings instance. |
| default | str | Name of the first/default subindex. |
| findmodel | Vectors | First matching vector model found across subindexes. |
Usage Examples
from txtai.embeddings import Embeddings
# Configure embeddings with multiple subindexes
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"content": True,
"indexes": {
"sparse": {
"scoring": {
"method": "bm25",
"terms": {}
}
},
"dense": {
"path": "sentence-transformers/all-MiniLM-L6-v2"
}
}
})
# Index documents - automatically routes to all subindexes
embeddings.index([
(0, "natural language processing", None),
(1, "computer vision algorithms", None),
])
# Search across indexes
results = embeddings.search("NLP techniques", limit=5)
# Save all indexes
embeddings.save("/tmp/multi_index")
# Load all indexes
embeddings.load("/tmp/multi_index")