Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai Subindex Manager

From Leeroopedia


Knowledge Sources
Domains Embeddings, Index Management, Multi-Index
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for managing a collection of subindexes within an embeddings instance provided by txtai.

Description

The Indexes class manages multiple subindexes that compose a parent embeddings instance. Each subindex is itself a full embeddings instance, enabling different vector models, scoring methods, or configurations to coexist within a single logical index.

Key features:

  • Dictionary-like access: Subindexes can be accessed by name using __contains__, __getitem__, or __getattr__, allowing both bracket notation (indexes["name"]) and attribute-style access (indexes.name).
  • Document filtering: During insert, the class filters documents that have valid text or object fields set, or includes all documents when top-level indexing is disabled (no model or scoring configured). Each document is assigned an index ID matching its position in the parent index.
  • Document streaming: Uses the Documents class to buffer inserted documents to disk for deferred indexing.
  • Lifecycle management: Provides index, upsert, delete, load, save, and close methods that delegate to each subindex in turn. The index and upsert methods also clean up the document stream after processing.
  • Checkpoint support: Supports checkpoint directories for indexing restart. Each subindex gets a subdirectory within the checkpoint path.
  • Model lookup: The findmodel method locates a vector model across subindexes, optionally filtered by index name.

Usage

Use Indexes when you need to configure multiple subindexes within a single txtai embeddings instance. This is useful for hybrid search (combining dense and sparse indexes), multi-model configurations, or organizing data into logical partitions that are searched together.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/embeddings/index/indexes.py

Signature

class Indexes:
    def __init__(self, embeddings, indexes)
    def __contains__(self, name) -> bool
    def __getitem__(self, name) -> Embeddings
    def __getattr__(self, name) -> Embeddings
    def default(self) -> str
    def findmodel(self, index=None) -> Vectors
    def insert(self, documents, index=None, checkpoint=None)
    def delete(self, ids)
    def index(self)
    def upsert(self)
    def load(self, path)
    def save(self, path)
    def close(self)

Import

from txtai.embeddings.index.indexes import Indexes

I/O Contract

Inputs

Name Type Required Description
embeddings Embeddings Yes Parent embeddings instance. Used to determine text/object column names and whether top-level indexing is enabled.
indexes dict Yes Dictionary mapping index names (str) to embeddings instances (Embeddings). Each is a fully configured subindex.
documents list[tuple] Yes (insert) List of (id, document, tags) tuples. Documents are filtered based on text/object field availability.
index int Yes (insert) Starting index ID offset, matching the parent index position.
checkpoint str No Optional checkpoint directory path for indexing restart support.
ids list Yes (delete) List of document IDs to remove from all subindexes.
path str Yes (load/save) Directory path for loading/saving subindexes. Each subindex is stored in a subdirectory named after its key.

Outputs

Name Type Description
contains bool True if the named index exists in this collection.
index Embeddings Retrieved subindex embeddings instance.
default str Name of the first/default subindex.
findmodel Vectors First matching vector model found across subindexes.

Usage Examples

from txtai.embeddings import Embeddings

# Configure embeddings with multiple subindexes
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True,
    "indexes": {
        "sparse": {
            "scoring": {
                "method": "bm25",
                "terms": {}
            }
        },
        "dense": {
            "path": "sentence-transformers/all-MiniLM-L6-v2"
        }
    }
})

# Index documents - automatically routes to all subindexes
embeddings.index([
    (0, "natural language processing", None),
    (1, "computer vision algorithms", None),
])

# Search across indexes
results = embeddings.search("NLP techniques", limit=5)

# Save all indexes
embeddings.save("/tmp/multi_index")

# Load all indexes
embeddings.load("/tmp/multi_index")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment