Implementation:Neuml Txtai Subindex Manager

Knowledge Sources	Neuml_Txtai
Domains	Embeddings, Index Management, Multi-Index
Last Updated	2026-02-10 01:00 GMT

Overview

Concrete tool for managing a collection of subindexes within an embeddings instance provided by txtai.

Description

The Indexes class manages multiple subindexes that compose a parent embeddings instance. Each subindex is itself a full embeddings instance, enabling different vector models, scoring methods, or configurations to coexist within a single logical index.

Key features:

Dictionary-like access: Subindexes can be accessed by name using __contains__, __getitem__, or __getattr__, allowing both bracket notation (indexes["name"]) and attribute-style access (indexes.name).
Document filtering: During insert, the class filters documents that have valid text or object fields set, or includes all documents when top-level indexing is disabled (no model or scoring configured). Each document is assigned an index ID matching its position in the parent index.
Document streaming: Uses the Documents class to buffer inserted documents to disk for deferred indexing.
Lifecycle management: Provides index, upsert, delete, load, save, and close methods that delegate to each subindex in turn. The index and upsert methods also clean up the document stream after processing.
Checkpoint support: Supports checkpoint directories for indexing restart. Each subindex gets a subdirectory within the checkpoint path.
Model lookup: The findmodel method locates a vector model across subindexes, optionally filtered by index name.

Usage

Use Indexes when you need to configure multiple subindexes within a single txtai embeddings instance. This is useful for hybrid search (combining dense and sparse indexes), multi-model configurations, or organizing data into logical partitions that are searched together.

Code Reference

Source Location

Repository: Neuml_Txtai
File: src/python/txtai/embeddings/index/indexes.py

Signature

class Indexes:
    def __init__(self, embeddings, indexes)
    def __contains__(self, name) -> bool
    def __getitem__(self, name) -> Embeddings
    def __getattr__(self, name) -> Embeddings
    def default(self) -> str
    def findmodel(self, index=None) -> Vectors
    def insert(self, documents, index=None, checkpoint=None)
    def delete(self, ids)
    def index(self)
    def upsert(self)
    def load(self, path)
    def save(self, path)
    def close(self)

Import

from txtai.embeddings.index.indexes import Indexes

I/O Contract

Inputs

Name	Type	Required	Description
embeddings	Embeddings	Yes	Parent embeddings instance. Used to determine text/object column names and whether top-level indexing is enabled.
indexes	dict	Yes	Dictionary mapping index names (str) to embeddings instances (Embeddings). Each is a fully configured subindex.
documents	list[tuple]	Yes (insert)	List of (id, document, tags) tuples. Documents are filtered based on text/object field availability.
index	int	Yes (insert)	Starting index ID offset, matching the parent index position.
checkpoint	str	No	Optional checkpoint directory path for indexing restart support.
ids	list	Yes (delete)	List of document IDs to remove from all subindexes.
path	str	Yes (load/save)	Directory path for loading/saving subindexes. Each subindex is stored in a subdirectory named after its key.

Outputs

Name	Type	Description
contains	bool	True if the named index exists in this collection.
index	Embeddings	Retrieved subindex embeddings instance.
default	str	Name of the first/default subindex.
findmodel	Vectors	First matching vector model found across subindexes.

Usage Examples

from txtai.embeddings import Embeddings

# Configure embeddings with multiple subindexes
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True,
    "indexes": {
        "sparse": {
            "scoring": {
                "method": "bm25",
                "terms": {}
            }
        },
        "dense": {
            "path": "sentence-transformers/all-MiniLM-L6-v2"
        }
    }
})

# Index documents - automatically routes to all subindexes
embeddings.index([
    (0, "natural language processing", None),
    (1, "computer vision algorithms", None),
])

# Search across indexes
results = embeddings.search("NLP techniques", limit=5)

# Save all indexes
embeddings.save("/tmp/multi_index")

# Load all indexes
embeddings.load("/tmp/multi_index")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment