Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai Embeddings Save Load

From Leeroopedia


Knowledge Sources
Domains NLP, Search
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for persisting and loading embeddings indexes to and from disk or cloud storage provided by the txtai library.

Description

The Embeddings.save and Embeddings.load methods handle the full lifecycle of index persistence. The save method writes all index components (configuration, ANN index, dimensionality reduction model, ID mappings, document database, scoring index, subindexes, and graph) to a directory at the specified path. If the path ends with a supported archive extension (.tar.gz, .tar.bz2, .tar.xz, or .zip), the index is additionally packaged into a compressed archive. If cloud configuration is provided, the saved index is uploaded to the specified cloud storage backend.

The load method reverses this process: it downloads from cloud storage if configured, extracts archive files if applicable, reads the configuration from the index directory, and then instantiates and loads each component. It returns the Embeddings instance itself, enabling method chaining. The config parameter allows overriding specific configuration values at load time, which is useful for changing runtime settings (e.g., switching the ANN backend) without rebuilding the index.

Usage

Call save after building an index to persist it to disk or cloud. Call load to restore a previously saved index. These methods are essential for any production deployment where indexes must survive process restarts or be shared across machines.

Code Reference

Source Location

  • Repository: txtai
  • File: src/python/txtai/embeddings/base.py
  • Lines: L532-661

Signature

def save(self, path, cloud=None, **kwargs):

def load(self, path=None, cloud=None, config=None, **kwargs):

Import

from txtai import Embeddings

I/O Contract

Inputs (save)

Name Type Required Description
path str Yes Output path for the index. If the path is a directory, components are saved as files within it. If the path ends with .tar.gz, .tar.bz2, .tar.xz, or .zip, the index is saved as a compressed archive.
cloud dict No Cloud storage configuration dictionary. When provided, the saved index is uploaded to the specified cloud backend (e.g., S3, GCS, Azure Blob).
**kwargs keyword args No Additional configuration passed to the cloud storage factory.

Inputs (load)

Name Type Required Description
path str No Path to the saved index directory or archive file. When cloud is configured, this is the cloud path from which to download.
cloud dict No Cloud storage configuration dictionary. When provided, the index is downloaded from the specified cloud backend before loading.
config dict No Configuration overrides to apply after loading the saved configuration. Useful for changing runtime parameters without rebuilding.
**kwargs keyword args No Additional configuration passed to the cloud storage factory.

Outputs (save)

Name Type Description
None None The method writes the index to the specified path. No return value.

Outputs (load)

Name Type Description
self Embeddings Returns the Embeddings instance with all components loaded and ready for search. Enables method chaining.

Usage Examples

Basic Save and Load

from txtai import Embeddings

# Build an index
embeddings = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2"})
embeddings.index(["US tops 5 million confirmed virus cases",
                   "Canada's last fully intact ice shelf has collapsed",
                   "Beijing launches high-level expenses probe"])

# Save to disk
embeddings.save("/tmp/my_index")

# Load in a new instance
embeddings2 = Embeddings()
embeddings2.load("/tmp/my_index")
results = embeddings2.search("pandemic", 1)

Save as Compressed Archive

from txtai import Embeddings

embeddings = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2"})
embeddings.index(["document one", "document two", "document three"])

# Save as tar.gz archive
embeddings.save("/tmp/my_index.tar.gz")

# Load from archive
embeddings2 = Embeddings()
embeddings2.load("/tmp/my_index.tar.gz")

Load with Configuration Overrides

from txtai import Embeddings

# Load an existing index with modified configuration
embeddings = Embeddings()
embeddings.load("/tmp/my_index", config={"backend": "numpy"})

Cloud Storage

from txtai import Embeddings

embeddings = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2"})
embeddings.index(["doc one", "doc two", "doc three"])

# Save to S3
embeddings.save("my-index", cloud={"provider": "aws", "container": "my-bucket"})

# Load from S3
embeddings2 = Embeddings()
embeddings2.load("my-index", cloud={"provider": "aws", "container": "my-bucket"})

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment