Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:AnswerDotAI RAGatouille ColBERTConfig Training

From Leeroopedia
Knowledge Sources
Domains NLP, Information_Retrieval, Training, Hyperparameter_Tuning
Last Updated 2026-02-12 12:00 GMT

Overview

Wrapper documentation for the ColBERTConfig class from the colbert-ai library, as used for training configuration within RAGatouille.

Description

ColBERTConfig is an external class from the colbert-ai library that holds all model and training configuration. Within RAGatouille, RAGTrainer.train() constructs a ColBERTConfig instance at lines L219-236 of RAGTrainer.py, combining user-provided hyperparameters with automatically computed values like warmup steps and save frequency.

External Reference

Usage

This configuration is constructed automatically by RAGTrainer.train(). Users control it through the train() method parameters rather than instantiating ColBERTConfig directly.

Code Reference

Source Location

  • Repository: RAGatouille
  • File: ragatouille/RAGTrainer.py
  • Lines: L219-236 (config construction within train())

Signature

# ColBERTConfig construction in RAGTrainer.train()
training_config = ColBERTConfig(
    bsize=batch_size,
    model_name=self.model_name,
    name=self.model_name,
    checkpoint=self.pretrained_model_name,
    use_ib_negatives=use_ib_negatives,
    maxsteps=maxsteps,
    nbits=nbits,
    lr=learning_rate,
    dim=dim,
    doc_maxlen=doc_maxlen,
    relu=use_relu,
    accumsteps=accumsteps,
    warmup=int(total_triplets // batch_size * 0.1)
    if warmup_steps == "auto"
    else warmup_steps,
    save_every=int(total_triplets // batch_size // 10),
)

Import

from colbert.infra import ColBERTConfig

I/O Contract

Inputs

Name Type Required Description
bsize int Yes Total batch size (divide by n_gpus for per-GPU)
model_name str Yes Name for the model (checkpoint directory naming)
checkpoint str Yes Base model checkpoint path
use_ib_negatives bool Yes Use in-batch negatives for loss
maxsteps int Yes Maximum training steps
nbits int Yes Vector compression bits (2 typical)
lr float Yes Learning rate (3e-6 to 2e-5 recommended)
dim int Yes Embedding dimension (128 default)
doc_maxlen int Yes Maximum document token length
relu bool Yes Use ReLU activation on embeddings
accumsteps int Yes Gradient accumulation steps
warmup int Yes Warmup steps (auto = 10% of total)
save_every int Yes Checkpoint save frequency

Outputs

Name Type Description
return ColBERTConfig Configured training config object ready for the ColBERT Trainer

Usage Examples

Default Configuration via RAGTrainer

from ragatouille import RAGTrainer

trainer = RAGTrainer(
    model_name="my_model",
    pretrained_model_name="colbert-ir/colbertv2.0",
)
trainer.prepare_training_data(raw_data=pairs)

# Configuration is set through train() parameters
model_path = trainer.train(
    batch_size=32,
    nbits=2,
    maxsteps=500_000,
    use_ib_negatives=True,
    learning_rate=5e-6,
    dim=128,
    doc_maxlen=256,
    warmup_steps="auto",
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment