Heuristic:AnswerDotAI RAGatouille Auto Batch Size For Long Documents

Knowledge Sources	RAGatouille
Domains	Optimization, Memory_Management, Information_Retrieval
Last Updated	2026-02-12 12:00 GMT

Overview

Automatic batch size reduction when encoding documents longer than 512 tokens to prevent out-of-memory errors.

Description

When encoding documents in-memory with `bsize="auto"`, RAGatouille automatically reduces the batch size for documents exceeding 512 tokens in `doc_maxlen`. The reduction formula halves the batch size for each doubling of document length beyond 512 tokens. Additionally, the auto max-token calculation uses the 90th percentile of document lengths (multiplied by 1.35 for tokenization expansion) rounded up to the nearest multiple of 32, clamped to a minimum of 256 tokens.

Usage

Use this heuristic when encoding long documents in memory and encountering OOM errors. Understanding the auto batch size behavior helps decide whether to set batch size manually or let the auto-tuning handle it.

The Insight (Rule of Thumb)

Action: When using `bsize="auto"` (default), the system automatically adjusts:
- `doc_maxlen` <= 512 → `bsize=32`
- `doc_maxlen` ~1024 → `bsize=16`
- `doc_maxlen` ~2048 → `bsize=8`
- Formula: `bsize = max(1, 32 / (2^(round(log2(doc_maxlen))) / 512))`
Max token auto-calculation:
- Takes the 90th percentile word count × 1.35 (tokenization factor)
- Rounds up to nearest multiple of 32
- Applies 1.1x safety margin
- Clamps between 256 and `base_model_max_tokens` (510)
Trade-off: Smaller batch size = less GPU/CPU memory used but slower encoding. Larger batch size = faster but more memory.

Reasoning

Memory consumption for token-level encoding scales linearly with both batch size and sequence length. When documents are very long, the intermediate activation tensors during encoding grow proportionally. By halving the batch size for each doubling of sequence length, memory usage remains roughly constant regardless of document length.

Batch size auto-adjustment from `ragatouille/models/colbert.py:596-614`:

if bsize == "auto":
    bsize = 32
    if self.inference_ckpt.doc_tokenizer.doc_maxlen > 512:
        bsize = max(
            1,
            int(
                32
                / (
                    2
                    ** round(
                        math.log(
                            self.inference_ckpt.doc_tokenizer.doc_maxlen, 2
                        )
                    )
                    / 512
                )
            ),
        )

Max token calculation from `ragatouille/models/colbert.py:511-518`:

percentile_90 = np.percentile(
    [len(x.split(" ")) for x in documents], 90
)
max_tokens = min(
    math.floor((math.ceil((percentile_90 * 1.35) / 32) * 32) * 1.1),
    self.base_model_max_tokens,
)
max_tokens = max(256, max_tokens)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment