Implementation:Neuml Txtai Embeddings Save
| Knowledge Sources | |
|---|---|
| Domains | Semantic_Search, NLP |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for persisting vector indexes and associated data to disk or cloud storage provided by the txtai library.
Description
The Embeddings.save method serializes the entire state of an embeddings index to a directory on disk. It writes each component independently: the configuration (as config.json), the ANN embeddings data, the LSA/PCA reducer, ID mappings, the document database, the scoring index, subindexes, and the graph. The method also supports saving to compressed archive formats (tar.gz, tar.bz2, tar.xz, zip) by detecting the file extension, and uploading the result to cloud storage when a cloud configuration is provided. If the configuration is None (no index has been built), the method is a no-op.
Usage
Use this method after building or updating an index to persist it for later reuse. Call save() with a directory path to write the index as a collection of files, or with an archive path (ending in .tar.gz, .tar.bz2, .tar.xz, or .zip) to write a single compressed file suitable for distribution. Pair with Embeddings.load() to restore the index in a subsequent session.
Code Reference
Source Location
- Repository: txtai
- File:
src/python/txtai/embeddings/base.py - Lines: L605-661
Signature
def save(self, path, cloud=None, **kwargs):
Import
from txtai.embeddings import Embeddings
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | str | Yes | Output path. If it is a directory path (e.g. "/data/my_index"), the index components are written as individual files within that directory. If it ends with a supported archive extension (.tar.gz, .tar.bz2, .tar.xz, .zip), the components are first written to a temporary directory, then packed into the archive file.
|
| cloud | dict or None | No | Cloud storage configuration dictionary. When provided, the saved index (directory or archive) is uploaded to the configured cloud backend (e.g., S3, GCS). The dict contents depend on the cloud provider. Defaults to None (local storage only). |
| **kwargs | keyword arguments | No | Additional cloud configuration keys merged with the cloud parameter. |
Outputs
| Name | Type | Description |
|---|---|---|
| (none) | None | This method returns None. It operates via side effects, writing the following files to the output path: config.json (index configuration), embeddings (ANN index data), lsa (PCA reducer, if configured), ids (ID mappings, if content is disabled), documents (document database), scoring (sparse scoring data), indexes (subindex data), graph (graph data). Only components that exist in the current index state are written. |
Usage Examples
Basic Example
from txtai.embeddings import Embeddings
# Build an index
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"content": True
})
embeddings.index([
(0, "Semantic search with vector embeddings", None),
(1, "Natural language understanding models", None),
(2, "Information retrieval techniques", None)
])
# Save to a directory
embeddings.save("/data/my_index")
# Later, load the index back
restored = Embeddings()
restored.load("/data/my_index")
print(restored.count())
# Output: 3
Example: Save as Compressed Archive
from txtai.embeddings import Embeddings
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"content": True
})
embeddings.index([
(0, "Deep learning for text classification", None),
(1, "Transformer architectures explained", None)
])
# Save as a tar.gz archive for easy distribution
embeddings.save("/data/my_index.tar.gz")
# Load from archive
restored = Embeddings()
restored.load("/data/my_index.tar.gz")
print(restored.count())
# Output: 2
Example: Save with Cloud Upload
from txtai.embeddings import Embeddings
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"content": True
})
embeddings.index([
(0, "Cloud-deployed semantic search", None),
(1, "Distributed index management", None)
])
# Save and upload to S3
embeddings.save("/data/my_index", cloud={
"provider": "aws",
"container": "my-bucket",
"prefix": "indexes/semantic"
})
Example: Context Manager with Save
from txtai.embeddings import Embeddings
# Using context manager ensures resources are cleaned up
with Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2", "content": True}) as embeddings:
embeddings.index([
(0, "Context manager pattern", None),
(1, "Resource management best practices", None)
])
embeddings.save("/data/my_index")
# embeddings.close() is called automatically on exit