Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Neuml Txtai Embeddings Save

From Leeroopedia


Knowledge Sources
Domains Semantic_Search, NLP
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for persisting vector indexes and associated data to disk or cloud storage provided by the txtai library.

Description

The Embeddings.save method serializes the entire state of an embeddings index to a directory on disk. It writes each component independently: the configuration (as config.json), the ANN embeddings data, the LSA/PCA reducer, ID mappings, the document database, the scoring index, subindexes, and the graph. The method also supports saving to compressed archive formats (tar.gz, tar.bz2, tar.xz, zip) by detecting the file extension, and uploading the result to cloud storage when a cloud configuration is provided. If the configuration is None (no index has been built), the method is a no-op.

Usage

Use this method after building or updating an index to persist it for later reuse. Call save() with a directory path to write the index as a collection of files, or with an archive path (ending in .tar.gz, .tar.bz2, .tar.xz, or .zip) to write a single compressed file suitable for distribution. Pair with Embeddings.load() to restore the index in a subsequent session.

Code Reference

Source Location

  • Repository: txtai
  • File: src/python/txtai/embeddings/base.py
  • Lines: L605-661

Signature

def save(self, path, cloud=None, **kwargs):

Import

from txtai.embeddings import Embeddings

I/O Contract

Inputs

Name Type Required Description
path str Yes Output path. If it is a directory path (e.g. "/data/my_index"), the index components are written as individual files within that directory. If it ends with a supported archive extension (.tar.gz, .tar.bz2, .tar.xz, .zip), the components are first written to a temporary directory, then packed into the archive file.
cloud dict or None No Cloud storage configuration dictionary. When provided, the saved index (directory or archive) is uploaded to the configured cloud backend (e.g., S3, GCS). The dict contents depend on the cloud provider. Defaults to None (local storage only).
**kwargs keyword arguments No Additional cloud configuration keys merged with the cloud parameter.

Outputs

Name Type Description
(none) None This method returns None. It operates via side effects, writing the following files to the output path: config.json (index configuration), embeddings (ANN index data), lsa (PCA reducer, if configured), ids (ID mappings, if content is disabled), documents (document database), scoring (sparse scoring data), indexes (subindex data), graph (graph data). Only components that exist in the current index state are written.

Usage Examples

Basic Example

from txtai.embeddings import Embeddings

# Build an index
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})

embeddings.index([
    (0, "Semantic search with vector embeddings", None),
    (1, "Natural language understanding models", None),
    (2, "Information retrieval techniques", None)
])

# Save to a directory
embeddings.save("/data/my_index")

# Later, load the index back
restored = Embeddings()
restored.load("/data/my_index")
print(restored.count())
# Output: 3

Example: Save as Compressed Archive

from txtai.embeddings import Embeddings

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})

embeddings.index([
    (0, "Deep learning for text classification", None),
    (1, "Transformer architectures explained", None)
])

# Save as a tar.gz archive for easy distribution
embeddings.save("/data/my_index.tar.gz")

# Load from archive
restored = Embeddings()
restored.load("/data/my_index.tar.gz")
print(restored.count())
# Output: 2

Example: Save with Cloud Upload

from txtai.embeddings import Embeddings

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})

embeddings.index([
    (0, "Cloud-deployed semantic search", None),
    (1, "Distributed index management", None)
])

# Save and upload to S3
embeddings.save("/data/my_index", cloud={
    "provider": "aws",
    "container": "my-bucket",
    "prefix": "indexes/semantic"
})

Example: Context Manager with Save

from txtai.embeddings import Embeddings

# Using context manager ensures resources are cleaned up
with Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2", "content": True}) as embeddings:
    embeddings.index([
        (0, "Context manager pattern", None),
        (1, "Resource management best practices", None)
    ])
    embeddings.save("/data/my_index")

# embeddings.close() is called automatically on exit

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment