Implementation:Neuml Txtai Embeddings Save Load

Knowledge Sources	txtai txtai Documentation
Domains	NLP, Search
Last Updated	2026-02-10 00:00 GMT

Overview

Concrete tool for persisting and loading embeddings indexes to and from disk or cloud storage provided by the txtai library.

Description

The Embeddings.save and Embeddings.load methods handle the full lifecycle of index persistence. The save method writes all index components (configuration, ANN index, dimensionality reduction model, ID mappings, document database, scoring index, subindexes, and graph) to a directory at the specified path. If the path ends with a supported archive extension (.tar.gz, .tar.bz2, .tar.xz, or .zip), the index is additionally packaged into a compressed archive. If cloud configuration is provided, the saved index is uploaded to the specified cloud storage backend.

The load method reverses this process: it downloads from cloud storage if configured, extracts archive files if applicable, reads the configuration from the index directory, and then instantiates and loads each component. It returns the Embeddings instance itself, enabling method chaining. The config parameter allows overriding specific configuration values at load time, which is useful for changing runtime settings (e.g., switching the ANN backend) without rebuilding the index.

Usage

Call save after building an index to persist it to disk or cloud. Call load to restore a previously saved index. These methods are essential for any production deployment where indexes must survive process restarts or be shared across machines.

Code Reference

Source Location

Repository: txtai
File: src/python/txtai/embeddings/base.py
Lines: L532-661

Signature

def save(self, path, cloud=None, **kwargs):

def load(self, path=None, cloud=None, config=None, **kwargs):

Import

from txtai import Embeddings

I/O Contract

Inputs (save)

Name	Type	Required	Description
path	str	Yes	Output path for the index. If the path is a directory, components are saved as files within it. If the path ends with .tar.gz, .tar.bz2, .tar.xz, or .zip, the index is saved as a compressed archive.
cloud	dict	No	Cloud storage configuration dictionary. When provided, the saved index is uploaded to the specified cloud backend (e.g., S3, GCS, Azure Blob).
**kwargs	keyword args	No	Additional configuration passed to the cloud storage factory.

Inputs (load)

Name	Type	Required	Description
path	str	No	Path to the saved index directory or archive file. When cloud is configured, this is the cloud path from which to download.
cloud	dict	No	Cloud storage configuration dictionary. When provided, the index is downloaded from the specified cloud backend before loading.
config	dict	No	Configuration overrides to apply after loading the saved configuration. Useful for changing runtime parameters without rebuilding.
**kwargs	keyword args	No	Additional configuration passed to the cloud storage factory.

Outputs (save)

Name	Type	Description
None	None	The method writes the index to the specified path. No return value.

Outputs (load)

Name	Type	Description
self	Embeddings	Returns the Embeddings instance with all components loaded and ready for search. Enables method chaining.

Usage Examples

Basic Save and Load

from txtai import Embeddings

# Build an index
embeddings = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2"})
embeddings.index(["US tops 5 million confirmed virus cases",
                   "Canada's last fully intact ice shelf has collapsed",
                   "Beijing launches high-level expenses probe"])

# Save to disk
embeddings.save("/tmp/my_index")

# Load in a new instance
embeddings2 = Embeddings()
embeddings2.load("/tmp/my_index")
results = embeddings2.search("pandemic", 1)

Save as Compressed Archive

from txtai import Embeddings

embeddings = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2"})
embeddings.index(["document one", "document two", "document three"])

# Save as tar.gz archive
embeddings.save("/tmp/my_index.tar.gz")

# Load from archive
embeddings2 = Embeddings()
embeddings2.load("/tmp/my_index.tar.gz")

Load with Configuration Overrides

from txtai import Embeddings

# Load an existing index with modified configuration
embeddings = Embeddings()
embeddings.load("/tmp/my_index", config={"backend": "numpy"})

Cloud Storage

from txtai import Embeddings

embeddings = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2"})
embeddings.index(["doc one", "doc two", "doc three"])

# Save to S3
embeddings.save("my-index", cloud={"provider": "aws", "container": "my-bucket"})

# Load from S3
embeddings2 = Embeddings()
embeddings2.load("my-index", cloud={"provider": "aws", "container": "my-bucket"})

Related Pages

Implements Principle

Principle:Neuml_Txtai_Index_Persistence

Requires Environment

Environment:Neuml_Txtai_Python_Core_Dependencies

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment