Principle:AnswerDotAI RAGatouille Training Configuration

Knowledge Sources	ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction RAGatouille
Domains	NLP, Information_Retrieval, Training, Hyperparameter_Tuning
Last Updated	2026-02-12 12:00 GMT

Overview

A hyperparameter configuration mechanism for ColBERT model training that defines batch size, learning rate, embedding dimensions, quantization bits, and training schedule parameters.

Description

Training Configuration encapsulates all hyperparameters needed for ColBERT model training into a ColBERTConfig object. This configuration controls the training dynamics, model architecture choices, and checkpoint saving behavior. Key parameters include:

Batch size and accumulation: Control effective training batch size across GPUs
Learning rate: ColBERT literature recommends 3e-6 to 2e-5 depending on data size
Embedding dimension: Size of per-token vector representations (default 128)
Quantization bits: Compression level for indexed vectors (default 2-bit)
In-batch negatives: Whether to use in-batch negatives for loss calculation
Warmup: Learning rate warmup steps (auto = 10% of total steps)
Save frequency: Checkpoint saving interval

Usage

Use this principle to understand and tune ColBERT training hyperparameters. The configuration is constructed within RAGTrainer.train() based on user-provided parameters and automatically computed values (warmup steps, save frequency).

Theoretical Basis

ColBERT training optimizes a pairwise softmax cross-entropy loss:

$ℒ = - \log \frac{e^{S (q, d^{+}) / τ}}{e^{S (q, d^{+}) / τ} + \sum_{d^{-}} e^{S (q, d^{-}) / τ}}$

Key configuration decisions:

Embedding Dimension (dim): Controls the expressiveness vs. efficiency tradeoff. Default 128 provides good balance.

In-Batch Negatives: When enabled, all other documents in the batch serve as additional negatives, providing O(B^2) training signal from a batch of size B.

Quantization Bits (nbits): Post-training compression. 2-bit is standard; does not affect training but affects index size.

Warmup: Linear learning rate warmup prevents early training instability. Auto-computed as 10% of total steps.

Related Pages

Implemented By

Implementation:AnswerDotAI_RAGatouille_ColBERTConfig_Training

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment