Implementation:Neuml Txtai Embeddings Save

Knowledge Sources	txtai txtai Documentation
Domains	Semantic_Search, NLP
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for persisting vector indexes and associated data to disk or cloud storage provided by the txtai library.

Description

The Embeddings.save method serializes the entire state of an embeddings index to a directory on disk. It writes each component independently: the configuration (as config.json), the ANN embeddings data, the LSA/PCA reducer, ID mappings, the document database, the scoring index, subindexes, and the graph. The method also supports saving to compressed archive formats (tar.gz, tar.bz2, tar.xz, zip) by detecting the file extension, and uploading the result to cloud storage when a cloud configuration is provided. If the configuration is None (no index has been built), the method is a no-op.

Usage

Use this method after building or updating an index to persist it for later reuse. Call save() with a directory path to write the index as a collection of files, or with an archive path (ending in .tar.gz, .tar.bz2, .tar.xz, or .zip) to write a single compressed file suitable for distribution. Pair with Embeddings.load() to restore the index in a subsequent session.

Code Reference

Source Location

Repository: txtai
File: src/python/txtai/embeddings/base.py
Lines: L605-661

Signature

def save(self, path, cloud=None, **kwargs):

Import

from txtai.embeddings import Embeddings

I/O Contract

Inputs

Name	Type	Required	Description
path	str	Yes	Output path. If it is a directory path (e.g. `"/data/my_index"`), the index components are written as individual files within that directory. If it ends with a supported archive extension (`.tar.gz`, `.tar.bz2`, `.tar.xz`, `.zip`), the components are first written to a temporary directory, then packed into the archive file.
cloud	dict or None	No	Cloud storage configuration dictionary. When provided, the saved index (directory or archive) is uploaded to the configured cloud backend (e.g., S3, GCS). The dict contents depend on the cloud provider. Defaults to None (local storage only).
**kwargs	keyword arguments	No	Additional cloud configuration keys merged with the cloud parameter.

Outputs

Name	Type	Description
(none)	None	This method returns None. It operates via side effects, writing the following files to the output path: config.json (index configuration), embeddings (ANN index data), lsa (PCA reducer, if configured), ids (ID mappings, if content is disabled), documents (document database), scoring (sparse scoring data), indexes (subindex data), graph (graph data). Only components that exist in the current index state are written.

Usage Examples

Basic Example

from txtai.embeddings import Embeddings

# Build an index
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})

embeddings.index([
    (0, "Semantic search with vector embeddings", None),
    (1, "Natural language understanding models", None),
    (2, "Information retrieval techniques", None)
])

# Save to a directory
embeddings.save("/data/my_index")

# Later, load the index back
restored = Embeddings()
restored.load("/data/my_index")
print(restored.count())
# Output: 3

Example: Save as Compressed Archive

from txtai.embeddings import Embeddings

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})

embeddings.index([
    (0, "Deep learning for text classification", None),
    (1, "Transformer architectures explained", None)
])

# Save as a tar.gz archive for easy distribution
embeddings.save("/data/my_index.tar.gz")

# Load from archive
restored = Embeddings()
restored.load("/data/my_index.tar.gz")
print(restored.count())
# Output: 2

Example: Save with Cloud Upload

from txtai.embeddings import Embeddings

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})

embeddings.index([
    (0, "Cloud-deployed semantic search", None),
    (1, "Distributed index management", None)
])

# Save and upload to S3
embeddings.save("/data/my_index", cloud={
    "provider": "aws",
    "container": "my-bucket",
    "prefix": "indexes/semantic"
})

Example: Context Manager with Save

from txtai.embeddings import Embeddings

# Using context manager ensures resources are cleaned up
with Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2", "content": True}) as embeddings:
    embeddings.index([
        (0, "Context manager pattern", None),
        (1, "Resource management best practices", None)
    ])
    embeddings.save("/data/my_index")

# embeddings.close() is called automatically on exit

Related Pages

Implements Principle

Principle:Neuml_Txtai_Index_Persistence

Requires Environment

Environment:Neuml_Txtai_Python_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment