Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:AnswerDotAI RAGatouille Auto Batch Size For Long Documents

From Leeroopedia
Knowledge Sources
Domains Optimization, Memory_Management, Information_Retrieval
Last Updated 2026-02-12 12:00 GMT

Overview

Automatic batch size reduction when encoding documents longer than 512 tokens to prevent out-of-memory errors.

Description

When encoding documents in-memory with `bsize="auto"`, RAGatouille automatically reduces the batch size for documents exceeding 512 tokens in `doc_maxlen`. The reduction formula halves the batch size for each doubling of document length beyond 512 tokens. Additionally, the auto max-token calculation uses the 90th percentile of document lengths (multiplied by 1.35 for tokenization expansion) rounded up to the nearest multiple of 32, clamped to a minimum of 256 tokens.

Usage

Use this heuristic when encoding long documents in memory and encountering OOM errors. Understanding the auto batch size behavior helps decide whether to set batch size manually or let the auto-tuning handle it.

The Insight (Rule of Thumb)

  • Action: When using `bsize="auto"` (default), the system automatically adjusts:
    • `doc_maxlen` <= 512 → `bsize=32`
    • `doc_maxlen` ~1024 → `bsize=16`
    • `doc_maxlen` ~2048 → `bsize=8`
    • Formula: `bsize = max(1, 32 / (2^(round(log2(doc_maxlen))) / 512))`
  • Max token auto-calculation:
    • Takes the 90th percentile word count × 1.35 (tokenization factor)
    • Rounds up to nearest multiple of 32
    • Applies 1.1x safety margin
    • Clamps between 256 and `base_model_max_tokens` (510)
  • Trade-off: Smaller batch size = less GPU/CPU memory used but slower encoding. Larger batch size = faster but more memory.

Reasoning

Memory consumption for token-level encoding scales linearly with both batch size and sequence length. When documents are very long, the intermediate activation tensors during encoding grow proportionally. By halving the batch size for each doubling of sequence length, memory usage remains roughly constant regardless of document length.

Batch size auto-adjustment from `ragatouille/models/colbert.py:596-614`:

if bsize == "auto":
    bsize = 32
    if self.inference_ckpt.doc_tokenizer.doc_maxlen > 512:
        bsize = max(
            1,
            int(
                32
                / (
                    2
                    ** round(
                        math.log(
                            self.inference_ckpt.doc_tokenizer.doc_maxlen, 2
                        )
                    )
                    / 512
                )
            ),
        )

Max token calculation from `ragatouille/models/colbert.py:511-518`:

percentile_90 = np.percentile(
    [len(x.split(" ")) for x in documents], 90
)
max_tokens = min(
    math.floor((math.ceil((percentile_90 * 1.35) / 32) * 32) * 1.1),
    self.base_model_max_tokens,
)
max_tokens = max(256, max_tokens)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment