Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:AnswerDotAI RAGatouille RAGTrainer Init

From Leeroopedia
Knowledge Sources
Domains NLP, Information_Retrieval, Training
Last Updated 2026-02-12 12:00 GMT

Overview

Concrete tool for initializing a ColBERT training pipeline from a pretrained model provided by the RAGatouille library.

Description

The RAGTrainer.__init__() constructor creates a training-ready ColBERT pipeline. It initializes the underlying ColBERT model with training_mode=True, which loads configuration from the checkpoint but skips creating the inference checkpoint (saving GPU memory). The trainer stores the model name (used for checkpoint directories), the pretrained model path, and the language code (used later for hard negative miner model selection).

Usage

Use this constructor when you want to fine-tune an existing ColBERT model or train a new one from a BERT-like backbone. This is always the first step in the ColBERT training workflow, before preparing training data and launching training.

Code Reference

Source Location

  • Repository: RAGatouille
  • File: ragatouille/RAGTrainer.py
  • Lines: L15-46

Signature

class RAGTrainer:
    def __init__(
        self,
        model_name: str,
        pretrained_model_name: str,
        language_code: str = "en",
        n_usable_gpus: int = -1,
    ):
        """
        Initialise a RAGTrainer instance.

        Parameters:
            model_name: Name for the new model (used in checkpoints/index names).
            pretrained_model_name: Base model (HuggingFace name or local path).
            language_code: Language code (default "en"). Used for hard negative mining.
            n_usable_gpus: Number of GPUs (-1 = auto).

        Returns:
            RAGTrainer: Initialized instance with base model loaded in training mode.
        """

Import

from ragatouille import RAGTrainer

I/O Contract

Inputs

Name Type Required Description
model_name str Yes Name for the model being trained. Used in checkpoint directory names
pretrained_model_name str Yes HuggingFace model name or local path to base checkpoint
language_code str No Language code for hard negative miner model selection (default "en"). Supported: "en", "zh", "fr", "other"
n_usable_gpus int No Number of GPUs to use (-1 = auto-detect, default -1)

Outputs

Name Type Description
return RAGTrainer Initialized trainer with self.model set to ColBERT(training_mode=True), empty self.collection and self.training_triplets

Usage Examples

Initialize Trainer for Fine-tuning

from ragatouille import RAGTrainer

# Fine-tune an existing ColBERT model
trainer = RAGTrainer(
    model_name="my_colbert_model",
    pretrained_model_name="colbert-ir/colbertv2.0",
    language_code="en",
)

Train from a BERT Backbone

from ragatouille import RAGTrainer

# Train a new ColBERT model from a BERT checkpoint
trainer = RAGTrainer(
    model_name="custom_colbert",
    pretrained_model_name="bert-base-uncased",
    language_code="en",
    n_usable_gpus=2,
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment