Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:AnswerDotAI RAGatouille Python ColBERT Dependencies

From Leeroopedia
Knowledge Sources
Domains NLP, Information_Retrieval, Infrastructure
Last Updated 2026-02-12 12:00 GMT

Overview

Python 3.9+ environment with PyTorch, ColBERT-AI, LlamaIndex, LangChain, and FAISS for running the RAGatouille late-interaction retrieval library.

Description

This environment provides the full Python dependency stack required to run RAGatouille. It is built around the colbert-ai library (>=0.2.19) which handles the core ColBERT model operations. The environment includes PyTorch (>=1.13) as the deep learning backbone, faiss-cpu for vector similarity operations, llama-index for document splitting, langchain and langchain_core for retrieval integrations, sentence-transformers for hard negative mining, voyager for ANN index in the negative miner, and srsly for fast JSON I/O. The onnx package is included for Vespa ONNX export support. Optional extras include rerankers and pylate for the training workflow.

Usage

Use this environment for all RAGatouille operations: loading pretrained models, building indexes, searching, reranking, encoding documents in memory, training/fine-tuning ColBERT models, and exporting to HuggingFace Hub or Vespa ONNX format. This is the mandatory prerequisite for every Implementation and Workflow in RAGatouille.

System Requirements

Category Requirement Notes
OS Linux (Ubuntu 18.04+) Windows is not supported (WSL2 may work). See Compatibility Notes.
Python 3.9, 3.10, or 3.11 Specified in pyproject.toml target-version
Hardware CPU (minimum) GPU optional but recommended for large collections
Disk 2GB+ free space Model checkpoints downloaded from HuggingFace Hub

Dependencies

Python Packages (Core)

  • `colbert-ai` >= 0.2.19
  • `torch` >= 1.13
  • `faiss-cpu`
  • `llama-index`
  • `langchain`
  • `langchain_core`
  • `sentence-transformers`
  • `voyager`
  • `srsly`
  • `onnx`
  • `fast-pytorch-kmeans`
  • `numpy`
  • `tqdm`
  • `huggingface-hub`
  • `transformers`

Python Packages (Optional Extras)

  • `rerankers` — for advanced reranking (train extra)
  • `pylate` — for training (train extra)

Credentials

The following environment variables may be needed depending on workflow:

  • `HF_TOKEN`: HuggingFace API token — required when uploading models via export_to_huggingface_hub(). Must run `huggingface-cli login` first.

Quick Install

# Install RAGatouille with all core dependencies
pip install RAGatouille

# For training extras
pip install RAGatouille[train]

# For all optional integrations
pip install RAGatouille[all]

Code Evidence

LlamaIndex import with backward-compatible fallback from `ragatouille/data/preprocessors.py:1-6`:

try:
    from llama_index import Document
    from llama_index.text_splitter import SentenceSplitter
except ImportError:
    from llama_index.core import Document
    from llama_index.core.text_splitter import SentenceSplitter

HuggingFace Hub authentication check from `ragatouille/models/utils.py:83-89`:

except ValueError as e:
    print(
        f"Could not create repository on the huggingface hub.\n",
        f"Error: {e}\n",
        "Please make sure you are logged in (run huggingface-cli login)\n",
        "If the error persists, please open an issue on github. This is a beta feature!",
    )

Dependency list from `pyproject.toml:28-40`:

dependencies = [
  "llama-index",
  "faiss-cpu",
  "langchain_core",
  "colbert-ai>=0.2.19",
  "langchain",
  "onnx",
  "srsly",
  "voyager",
  "torch>=1.13",
  "fast-pytorch-kmeans",
  "sentence-transformers",
]

Common Errors

Error Message Cause Solution
`ImportError: No module named 'llama_index'` llama-index not installed `pip install llama-index`
`ImportError: No module named 'colbert'` colbert-ai not installed `pip install colbert-ai>=0.2.19`
`ValueError: Could not create repository on the huggingface hub` Not logged in to HuggingFace Run `huggingface-cli login` before exporting
`HfHubHTTPError` Incorrect repo name format Use format `yourusername/your-repo-name`
`FileNotFoundError: Could not load pid_docid_map from index!` Loading incompatible older index Rebuild the index with the current RAGatouille version

Compatibility Notes

  • Windows: Not supported. RAGatouille does not work outside WSL. WSL1 has known issues. WSL2 has been reported to work by some users.
  • Scripts: Code must run inside `if __name__ == "__main__"` guard when running as a script (required by colbert-ai's multiprocessing).
  • LlamaIndex versions: The preprocessor handles both old (`llama_index.text_splitter`) and new (`llama_index.core.text_splitter`) import paths via try/except.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment