Heuristic:AnswerDotAI RAGatouille Auto Batch Size For Long Documents
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Memory_Management, Information_Retrieval |
| Last Updated | 2026-02-12 12:00 GMT |
Overview
Automatic batch size reduction when encoding documents longer than 512 tokens to prevent out-of-memory errors.
Description
When encoding documents in-memory with `bsize="auto"`, RAGatouille automatically reduces the batch size for documents exceeding 512 tokens in `doc_maxlen`. The reduction formula halves the batch size for each doubling of document length beyond 512 tokens. Additionally, the auto max-token calculation uses the 90th percentile of document lengths (multiplied by 1.35 for tokenization expansion) rounded up to the nearest multiple of 32, clamped to a minimum of 256 tokens.
Usage
Use this heuristic when encoding long documents in memory and encountering OOM errors. Understanding the auto batch size behavior helps decide whether to set batch size manually or let the auto-tuning handle it.
The Insight (Rule of Thumb)
- Action: When using `bsize="auto"` (default), the system automatically adjusts:
- `doc_maxlen` <= 512 → `bsize=32`
- `doc_maxlen` ~1024 → `bsize=16`
- `doc_maxlen` ~2048 → `bsize=8`
- Formula: `bsize = max(1, 32 / (2^(round(log2(doc_maxlen))) / 512))`
- Max token auto-calculation:
- Takes the 90th percentile word count × 1.35 (tokenization factor)
- Rounds up to nearest multiple of 32
- Applies 1.1x safety margin
- Clamps between 256 and `base_model_max_tokens` (510)
- Trade-off: Smaller batch size = less GPU/CPU memory used but slower encoding. Larger batch size = faster but more memory.
Reasoning
Memory consumption for token-level encoding scales linearly with both batch size and sequence length. When documents are very long, the intermediate activation tensors during encoding grow proportionally. By halving the batch size for each doubling of sequence length, memory usage remains roughly constant regardless of document length.
Batch size auto-adjustment from `ragatouille/models/colbert.py:596-614`:
if bsize == "auto":
bsize = 32
if self.inference_ckpt.doc_tokenizer.doc_maxlen > 512:
bsize = max(
1,
int(
32
/ (
2
** round(
math.log(
self.inference_ckpt.doc_tokenizer.doc_maxlen, 2
)
)
/ 512
)
),
)
Max token calculation from `ragatouille/models/colbert.py:511-518`:
percentile_90 = np.percentile(
[len(x.split(" ")) for x in documents], 90
)
max_tokens = min(
math.floor((math.ceil((percentile_90 * 1.35) / 32) * 32) * 1.1),
self.base_model_max_tokens,
)
max_tokens = max(256, max_tokens)