Heuristic:Neuml Txtai Batch Size Defaults

Knowledge Sources	txtai Internal defaults tuned for production usage
Domains	Optimization, NLP
Last Updated	2026-02-10 00:00 GMT

Overview

Default batch size configuration across txtai subsystems, ranging from 32 (vector encoding) to 1024 (indexing) to optimize throughput vs. memory.

Description

txtai uses different default batch sizes across its subsystems, each tuned for the specific workload type. Vector encoding uses smaller batches (32) because model inference is GPU-memory-bound. Document indexing uses larger batches (1024) because it is I/O-bound. Workflow processing uses moderate batches (100) as a balance for diverse pipeline operations. Understanding these defaults is critical for tuning performance on different hardware configurations.

Usage

Apply this heuristic when tuning performance for indexing, search, or workflow operations. Increase batch sizes on high-memory GPUs for better throughput. Decrease batch sizes if encountering OOM errors. Use the `encodebatch`, `batch`, or workflow `batch` configuration parameters to override defaults.

The Insight (Rule of Thumb)

Vector Encoding (`encodebatch`): Default 32. Controls the batch size sent to the underlying model during vector encoding. Increase to 64-128 on GPUs with >16GB VRAM. Decrease to 8-16 on low-memory GPUs.
Index Processing (`batch`): Default 1024. Controls how many documents are processed per indexing batch. Safe to increase on systems with ample RAM.
Workflow Processing (`batch`): Default 100. Controls how many items are processed per workflow step. Tunable per workflow.
HF Pipeline (`batchsize`): Default 64. Controls batch size for Hugging Face pipeline tasks (summary, translation, etc.).
Sparse Scoring (`batch`): Default 1024. Controls batch size for sparse vector encoding.
Thread Batching (IVFSparse): Default 32 queries per thread. Threads capped at CPU count.
Trade-off: Larger batches improve throughput but increase peak memory usage. Smaller batches reduce memory at the cost of lower throughput.

Reasoning

Each subsystem has different memory and compute profiles. Vector encoding batches are small because transformer models consume significant GPU memory per sample (attention maps scale with sequence length). Indexing batches are large because document processing is dominated by I/O, not memory. Workflow batches balance the needs of heterogeneous pipeline steps. These defaults represent the library authors' experience with common hardware configurations.

The default worker count for workflows is automatically set to the maximum number of actions in any task, ensuring parallel execution capacity without oversubscription.

Code Evidence

Vector encode batch from `vectors/base.py:47-48`:

# Encode batch size - controls underlying model batch size when encoding vectors
self.encodebatch = config.get("encodebatch", 32)

Index batch size from `embeddings/index/transform.py:40`:

self.batch = embeddings.config.get("batch", 1024)

Workflow batch from `workflow/base.py:30`:

batch: how many items to process at a time, defaults to 100

Workflow workers default from `workflow/base.py:48-49`:

# Set default number of executor workers to max number of actions in a task
self.workers = max(len(task.action) for task in self.tasks) if not self.workers else self.workers

IVFSparse thread batching from `ann/sparse/ivfsparse.py:123-125`:

# Calculate number of threads using a thread batch size of 32
threads = queries.shape[0] // 32
threads = min(max(threads, 1), os.cpu_count())

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment