Principle:AnswerDotAI RAGatouille Training Configuration
| Knowledge Sources | |
|---|---|
| Domains | NLP, Information_Retrieval, Training, Hyperparameter_Tuning |
| Last Updated | 2026-02-12 12:00 GMT |
Overview
A hyperparameter configuration mechanism for ColBERT model training that defines batch size, learning rate, embedding dimensions, quantization bits, and training schedule parameters.
Description
Training Configuration encapsulates all hyperparameters needed for ColBERT model training into a ColBERTConfig object. This configuration controls the training dynamics, model architecture choices, and checkpoint saving behavior. Key parameters include:
- Batch size and accumulation: Control effective training batch size across GPUs
- Learning rate: ColBERT literature recommends 3e-6 to 2e-5 depending on data size
- Embedding dimension: Size of per-token vector representations (default 128)
- Quantization bits: Compression level for indexed vectors (default 2-bit)
- In-batch negatives: Whether to use in-batch negatives for loss calculation
- Warmup: Learning rate warmup steps (auto = 10% of total steps)
- Save frequency: Checkpoint saving interval
Usage
Use this principle to understand and tune ColBERT training hyperparameters. The configuration is constructed within RAGTrainer.train() based on user-provided parameters and automatically computed values (warmup steps, save frequency).
Theoretical Basis
ColBERT training optimizes a pairwise softmax cross-entropy loss:
Key configuration decisions:
Embedding Dimension (dim): Controls the expressiveness vs. efficiency tradeoff. Default 128 provides good balance.
In-Batch Negatives: When enabled, all other documents in the batch serve as additional negatives, providing O(B^2) training signal from a batch of size B.
Quantization Bits (nbits): Post-training compression. 2-bit is standard; does not affect training but affects index size.
Warmup: Linear learning rate warmup prevents early training instability. Auto-computed as 10% of total steps.