Implementation:AnswerDotAI RAGatouille ColBERTConfig Training
| Knowledge Sources | |
|---|---|
| Domains | NLP, Information_Retrieval, Training, Hyperparameter_Tuning |
| Last Updated | 2026-02-12 12:00 GMT |
Overview
Wrapper documentation for the ColBERTConfig class from the colbert-ai library, as used for training configuration within RAGatouille.
Description
ColBERTConfig is an external class from the colbert-ai library that holds all model and training configuration. Within RAGatouille, RAGTrainer.train() constructs a ColBERTConfig instance at lines L219-236 of RAGTrainer.py, combining user-provided hyperparameters with automatically computed values like warmup steps and save frequency.
External Reference
Usage
This configuration is constructed automatically by RAGTrainer.train(). Users control it through the train() method parameters rather than instantiating ColBERTConfig directly.
Code Reference
Source Location
- Repository: RAGatouille
- File: ragatouille/RAGTrainer.py
- Lines: L219-236 (config construction within train())
Signature
# ColBERTConfig construction in RAGTrainer.train()
training_config = ColBERTConfig(
bsize=batch_size,
model_name=self.model_name,
name=self.model_name,
checkpoint=self.pretrained_model_name,
use_ib_negatives=use_ib_negatives,
maxsteps=maxsteps,
nbits=nbits,
lr=learning_rate,
dim=dim,
doc_maxlen=doc_maxlen,
relu=use_relu,
accumsteps=accumsteps,
warmup=int(total_triplets // batch_size * 0.1)
if warmup_steps == "auto"
else warmup_steps,
save_every=int(total_triplets // batch_size // 10),
)
Import
from colbert.infra import ColBERTConfig
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| bsize | int | Yes | Total batch size (divide by n_gpus for per-GPU) |
| model_name | str | Yes | Name for the model (checkpoint directory naming) |
| checkpoint | str | Yes | Base model checkpoint path |
| use_ib_negatives | bool | Yes | Use in-batch negatives for loss |
| maxsteps | int | Yes | Maximum training steps |
| nbits | int | Yes | Vector compression bits (2 typical) |
| lr | float | Yes | Learning rate (3e-6 to 2e-5 recommended) |
| dim | int | Yes | Embedding dimension (128 default) |
| doc_maxlen | int | Yes | Maximum document token length |
| relu | bool | Yes | Use ReLU activation on embeddings |
| accumsteps | int | Yes | Gradient accumulation steps |
| warmup | int | Yes | Warmup steps (auto = 10% of total) |
| save_every | int | Yes | Checkpoint save frequency |
Outputs
| Name | Type | Description |
|---|---|---|
| return | ColBERTConfig | Configured training config object ready for the ColBERT Trainer |
Usage Examples
Default Configuration via RAGTrainer
from ragatouille import RAGTrainer
trainer = RAGTrainer(
model_name="my_model",
pretrained_model_name="colbert-ir/colbertv2.0",
)
trainer.prepare_training_data(raw_data=pairs)
# Configuration is set through train() parameters
model_path = trainer.train(
batch_size=32,
nbits=2,
maxsteps=500_000,
use_ib_negatives=True,
learning_rate=5e-6,
dim=128,
doc_maxlen=256,
warmup_steps="auto",
)