Implementation:AnswerDotAI RAGatouille RAGTrainer Init
| Knowledge Sources | |
|---|---|
| Domains | NLP, Information_Retrieval, Training |
| Last Updated | 2026-02-12 12:00 GMT |
Overview
Concrete tool for initializing a ColBERT training pipeline from a pretrained model provided by the RAGatouille library.
Description
The RAGTrainer.__init__() constructor creates a training-ready ColBERT pipeline. It initializes the underlying ColBERT model with training_mode=True, which loads configuration from the checkpoint but skips creating the inference checkpoint (saving GPU memory). The trainer stores the model name (used for checkpoint directories), the pretrained model path, and the language code (used later for hard negative miner model selection).
Usage
Use this constructor when you want to fine-tune an existing ColBERT model or train a new one from a BERT-like backbone. This is always the first step in the ColBERT training workflow, before preparing training data and launching training.
Code Reference
Source Location
- Repository: RAGatouille
- File: ragatouille/RAGTrainer.py
- Lines: L15-46
Signature
class RAGTrainer:
def __init__(
self,
model_name: str,
pretrained_model_name: str,
language_code: str = "en",
n_usable_gpus: int = -1,
):
"""
Initialise a RAGTrainer instance.
Parameters:
model_name: Name for the new model (used in checkpoints/index names).
pretrained_model_name: Base model (HuggingFace name or local path).
language_code: Language code (default "en"). Used for hard negative mining.
n_usable_gpus: Number of GPUs (-1 = auto).
Returns:
RAGTrainer: Initialized instance with base model loaded in training mode.
"""
Import
from ragatouille import RAGTrainer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_name | str | Yes | Name for the model being trained. Used in checkpoint directory names |
| pretrained_model_name | str | Yes | HuggingFace model name or local path to base checkpoint |
| language_code | str | No | Language code for hard negative miner model selection (default "en"). Supported: "en", "zh", "fr", "other" |
| n_usable_gpus | int | No | Number of GPUs to use (-1 = auto-detect, default -1) |
Outputs
| Name | Type | Description |
|---|---|---|
| return | RAGTrainer | Initialized trainer with self.model set to ColBERT(training_mode=True), empty self.collection and self.training_triplets |
Usage Examples
Initialize Trainer for Fine-tuning
from ragatouille import RAGTrainer
# Fine-tune an existing ColBERT model
trainer = RAGTrainer(
model_name="my_colbert_model",
pretrained_model_name="colbert-ir/colbertv2.0",
language_code="en",
)
Train from a BERT Backbone
from ragatouille import RAGTrainer
# Train a new ColBERT model from a BERT checkpoint
trainer = RAGTrainer(
model_name="custom_colbert",
pretrained_model_name="bert-base-uncased",
language_code="en",
n_usable_gpus=2,
)