Heuristic:Neuml Txtai Memory Streaming Optimization

Knowledge Sources	txtai
Domains	Optimization, Memory_Management, Indexing
Last Updated	2026-02-09 17:00 GMT

Overview

Memory-efficient indexing strategy that streams embeddings to disk in batches rather than accumulating in RAM, enabling indexing of datasets larger than available memory with checkpoint-based recovery.

Description

When indexing large document collections, the naive approach of generating all embeddings in memory before building the index would exhaust RAM. txtai implements a streaming pattern: embeddings are generated in batches (default 1024 documents), immediately written to a temporary file on disk, and the in-memory buffer is released. After all batches are processed, the accumulated embeddings are read back for ANN index construction. This pattern also supports checkpoint-based recovery — if indexing is interrupted, it can resume from the last completed batch rather than restarting from scratch. Additionally, the term frequency scoring module uses a configurable cache limit (default 250MB) to flush sparse term data to disk when memory usage exceeds the threshold.

Usage

This heuristic applies when indexing large document collections (tens of thousands to millions of documents). The streaming pattern is built into txtai's indexing pipeline and activates automatically. Tune the `batch` parameter in transform configuration to control the trade-off: smaller batches use less peak memory but increase disk I/O. The term frequency `cachelimit` (default 250MB) can be adjusted for sparse scoring indexes.

The Insight (Rule of Thumb)

Action 1: Use txtai's built-in batch streaming for large indexes (automatic, default batch=1024).
Action 2: Reduce batch size if encountering OOM during indexing.
Action 3: Adjust `cachelimit` (default 250,000,000 bytes) for term frequency indexes if memory-constrained.
Action 4: Use `cutoff` parameter (default 0.1) to control common-term threshold in sparse indexes.
Trade-off: Streaming to disk adds I/O overhead but prevents OOM. Smaller batches reduce peak memory at the cost of more disk writes.
Recovery: Checkpoint support enables resuming interrupted indexing jobs.

Reasoning

For a million documents with 384-dimensional embeddings at float32, the raw embedding matrix requires ~1.5GB of RAM. With larger models (768 or 1024 dimensions) or bigger datasets, this quickly exceeds available memory. Streaming to disk trades sequential I/O (fast on SSDs) for bounded memory usage. The 1024-document batch size was chosen as a balance: large enough to amortize the overhead of model inference and disk writes, small enough to keep peak memory manageable. The term frequency cache limit of 250MB prevents the sparse scoring index from consuming unbounded memory during document ingestion, flushing to a database when the threshold is reached.

# From src/python/txtai/vectors/base.py:48
# Encode batch size - controls underlying model batch size when encoding vectors
self.encodebatch = config.get("encodebatch", 32)

# From src/python/txtai/scoring/terms.py:68-69
self.cachelimit = self.config.get("cachelimit", 250000000)
self.cutoff = self.config.get("cutoff", 0.1)

# From src/python/txtai/models/pooling/base.py:72-74
# Sort document indices from largest to smallest to enable efficient batching
# This performance tweak matches logic in sentence-transformers
lengths = np.argsort([-len(x) if x else 0 for x in documents])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment