Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:AnswerDotAI RAGatouille Training Configuration

From Leeroopedia
Knowledge Sources
Domains NLP, Information_Retrieval, Training, Hyperparameter_Tuning
Last Updated 2026-02-12 12:00 GMT

Overview

A hyperparameter configuration mechanism for ColBERT model training that defines batch size, learning rate, embedding dimensions, quantization bits, and training schedule parameters.

Description

Training Configuration encapsulates all hyperparameters needed for ColBERT model training into a ColBERTConfig object. This configuration controls the training dynamics, model architecture choices, and checkpoint saving behavior. Key parameters include:

  • Batch size and accumulation: Control effective training batch size across GPUs
  • Learning rate: ColBERT literature recommends 3e-6 to 2e-5 depending on data size
  • Embedding dimension: Size of per-token vector representations (default 128)
  • Quantization bits: Compression level for indexed vectors (default 2-bit)
  • In-batch negatives: Whether to use in-batch negatives for loss calculation
  • Warmup: Learning rate warmup steps (auto = 10% of total steps)
  • Save frequency: Checkpoint saving interval

Usage

Use this principle to understand and tune ColBERT training hyperparameters. The configuration is constructed within RAGTrainer.train() based on user-provided parameters and automatically computed values (warmup steps, save frequency).

Theoretical Basis

ColBERT training optimizes a pairwise softmax cross-entropy loss:

=logeS(q,d+)/τeS(q,d+)/τ+deS(q,d)/τ

Key configuration decisions:

Embedding Dimension (dim): Controls the expressiveness vs. efficiency tradeoff. Default 128 provides good balance.

In-Batch Negatives: When enabled, all other documents in the batch serve as additional negatives, providing O(B^2) training signal from a batch of size B.

Quantization Bits (nbits): Post-training compression. 2-bit is standard; does not affect training but affects index size.

Warmup: Linear learning rate warmup prevents early training instability. Auto-computed as 10% of total steps.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment