Implementation:Neuml Txtai Hub Cloud
| Knowledge Sources | |
|---|---|
| Domains | Cloud Storage, Model Hub, Hugging Face |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete tool for syncing embeddings indexes to and from Hugging Face Hub provided by txtai.
Description
The HuggingFaceHub class is a Cloud provider implementation that enables saving and loading txtai embeddings indexes from Hugging Face Hub repositories. It supports both full repository snapshots and individual archive file downloads. On save, it creates a repository (private by default), updates .gitattributes to enable Git LFS tracking for large embeddings and document files, and uploads all index files. On load, it downloads either a single archive file or the entire repository snapshot to a local cache directory.
Usage
Use HuggingFaceHub when you want to persist and share txtai embeddings indexes via Hugging Face Hub. This is configured in the txtai cloud configuration by specifying the provider as "huggingface-hub" with a container (repo_id), optional revision (branch/tag), cache directory, and authentication token. It enables collaborative and versioned storage of embeddings indexes.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File: src/python/txtai/cloud/hub.py
Signature
class HuggingFaceHub(Cloud):
def metadata(self, path=None)
def load(self, path=None)
def save(self, path)
def lfstrack(self)
Import
from txtai.cloud.hub import HuggingFaceHub
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | dict | Yes | Cloud configuration dictionary with keys: "container" (repo_id), optional "revision", "cache", "token", "private" |
| path | str | No | Local file path; if an archive file path, operations target that specific file; otherwise operations target the entire repository |
Outputs
| Name | Type | Description |
|---|---|---|
| metadata result | object or None | HF file metadata (for archive paths) or model_info (for repos); None if repository not found |
| load result | str | Local file system path to the downloaded file or repository snapshot |
Usage Examples
from txtai.cloud.hub import HuggingFaceHub
# Configure Hugging Face Hub cloud provider
config = {
"container": "username/my-embeddings-index",
"revision": "main",
"token": "hf_your_token_here",
"private": True,
"cache": "/tmp/hf_cache"
}
hub = HuggingFaceHub(config)
# Check if repository exists
if hub.exists():
# Load entire index repository to local cache
local_path = hub.load()
# Save local embeddings index to Hugging Face Hub
hub.save("/path/to/local/index")
# Load a specific archive file
archive_path = hub.load("/path/to/embeddings.tar.gz")