Implementation:AnswerDotAI RAGatouille ColBERTConfig Training

Knowledge Sources	RAGatouille ColBERT
Domains	NLP, Information_Retrieval, Training, Hyperparameter_Tuning
Last Updated	2026-02-12 12:00 GMT

Overview

Wrapper documentation for the ColBERTConfig class from the colbert-ai library, as used for training configuration within RAGatouille.

Description

ColBERTConfig is an external class from the colbert-ai library that holds all model and training configuration. Within RAGatouille, RAGTrainer.train() constructs a ColBERTConfig instance at lines L219-236 of RAGTrainer.py, combining user-provided hyperparameters with automatically computed values like warmup steps and save frequency.

External Reference

ColBERT GitHub Repository

Usage

This configuration is constructed automatically by RAGTrainer.train(). Users control it through the train() method parameters rather than instantiating ColBERTConfig directly.

Code Reference

Source Location

Repository: RAGatouille
File: ragatouille/RAGTrainer.py
Lines: L219-236 (config construction within train())

Signature

# ColBERTConfig construction in RAGTrainer.train()
training_config = ColBERTConfig(
    bsize=batch_size,
    model_name=self.model_name,
    name=self.model_name,
    checkpoint=self.pretrained_model_name,
    use_ib_negatives=use_ib_negatives,
    maxsteps=maxsteps,
    nbits=nbits,
    lr=learning_rate,
    dim=dim,
    doc_maxlen=doc_maxlen,
    relu=use_relu,
    accumsteps=accumsteps,
    warmup=int(total_triplets // batch_size * 0.1)
    if warmup_steps == "auto"
    else warmup_steps,
    save_every=int(total_triplets // batch_size // 10),
)

Import

from colbert.infra import ColBERTConfig

I/O Contract

Inputs

Name	Type	Required	Description
bsize	int	Yes	Total batch size (divide by n_gpus for per-GPU)
model_name	str	Yes	Name for the model (checkpoint directory naming)
checkpoint	str	Yes	Base model checkpoint path
use_ib_negatives	bool	Yes	Use in-batch negatives for loss
maxsteps	int	Yes	Maximum training steps
nbits	int	Yes	Vector compression bits (2 typical)
lr	float	Yes	Learning rate (3e-6 to 2e-5 recommended)
dim	int	Yes	Embedding dimension (128 default)
doc_maxlen	int	Yes	Maximum document token length
relu	bool	Yes	Use ReLU activation on embeddings
accumsteps	int	Yes	Gradient accumulation steps
warmup	int	Yes	Warmup steps (auto = 10% of total)
save_every	int	Yes	Checkpoint save frequency

Outputs

Name	Type	Description
return	ColBERTConfig	Configured training config object ready for the ColBERT Trainer

Usage Examples

Default Configuration via RAGTrainer

from ragatouille import RAGTrainer

trainer = RAGTrainer(
    model_name="my_model",
    pretrained_model_name="colbert-ir/colbertv2.0",
)
trainer.prepare_training_data(raw_data=pairs)

# Configuration is set through train() parameters
model_path = trainer.train(
    batch_size=32,
    nbits=2,
    maxsteps=500_000,
    use_ib_negatives=True,
    learning_rate=5e-6,
    dim=128,
    doc_maxlen=256,
    warmup_steps="auto",
)

Related Pages

Implements Principle

Principle:AnswerDotAI_RAGatouille_Training_Configuration

Requires Environment

Environment:AnswerDotAI_RAGatouille_Python_ColBERT_Dependencies

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment