Environment:AnswerDotAI RAGatouille Python ColBERT Dependencies
| Knowledge Sources | |
|---|---|
| Domains | NLP, Information_Retrieval, Infrastructure |
| Last Updated | 2026-02-12 12:00 GMT |
Overview
Python 3.9+ environment with PyTorch, ColBERT-AI, LlamaIndex, LangChain, and FAISS for running the RAGatouille late-interaction retrieval library.
Description
This environment provides the full Python dependency stack required to run RAGatouille. It is built around the colbert-ai library (>=0.2.19) which handles the core ColBERT model operations. The environment includes PyTorch (>=1.13) as the deep learning backbone, faiss-cpu for vector similarity operations, llama-index for document splitting, langchain and langchain_core for retrieval integrations, sentence-transformers for hard negative mining, voyager for ANN index in the negative miner, and srsly for fast JSON I/O. The onnx package is included for Vespa ONNX export support. Optional extras include rerankers and pylate for the training workflow.
Usage
Use this environment for all RAGatouille operations: loading pretrained models, building indexes, searching, reranking, encoding documents in memory, training/fine-tuning ColBERT models, and exporting to HuggingFace Hub or Vespa ONNX format. This is the mandatory prerequisite for every Implementation and Workflow in RAGatouille.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu 18.04+) | Windows is not supported (WSL2 may work). See Compatibility Notes. |
| Python | 3.9, 3.10, or 3.11 | Specified in pyproject.toml target-version |
| Hardware | CPU (minimum) | GPU optional but recommended for large collections |
| Disk | 2GB+ free space | Model checkpoints downloaded from HuggingFace Hub |
Dependencies
Python Packages (Core)
- `colbert-ai` >= 0.2.19
- `torch` >= 1.13
- `faiss-cpu`
- `llama-index`
- `langchain`
- `langchain_core`
- `sentence-transformers`
- `voyager`
- `srsly`
- `onnx`
- `fast-pytorch-kmeans`
- `numpy`
- `tqdm`
- `huggingface-hub`
- `transformers`
Python Packages (Optional Extras)
- `rerankers` — for advanced reranking (train extra)
- `pylate` — for training (train extra)
Credentials
The following environment variables may be needed depending on workflow:
- `HF_TOKEN`: HuggingFace API token — required when uploading models via export_to_huggingface_hub(). Must run `huggingface-cli login` first.
Quick Install
# Install RAGatouille with all core dependencies
pip install RAGatouille
# For training extras
pip install RAGatouille[train]
# For all optional integrations
pip install RAGatouille[all]
Code Evidence
LlamaIndex import with backward-compatible fallback from `ragatouille/data/preprocessors.py:1-6`:
try:
from llama_index import Document
from llama_index.text_splitter import SentenceSplitter
except ImportError:
from llama_index.core import Document
from llama_index.core.text_splitter import SentenceSplitter
HuggingFace Hub authentication check from `ragatouille/models/utils.py:83-89`:
except ValueError as e:
print(
f"Could not create repository on the huggingface hub.\n",
f"Error: {e}\n",
"Please make sure you are logged in (run huggingface-cli login)\n",
"If the error persists, please open an issue on github. This is a beta feature!",
)
Dependency list from `pyproject.toml:28-40`:
dependencies = [
"llama-index",
"faiss-cpu",
"langchain_core",
"colbert-ai>=0.2.19",
"langchain",
"onnx",
"srsly",
"voyager",
"torch>=1.13",
"fast-pytorch-kmeans",
"sentence-transformers",
]
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ImportError: No module named 'llama_index'` | llama-index not installed | `pip install llama-index` |
| `ImportError: No module named 'colbert'` | colbert-ai not installed | `pip install colbert-ai>=0.2.19` |
| `ValueError: Could not create repository on the huggingface hub` | Not logged in to HuggingFace | Run `huggingface-cli login` before exporting |
| `HfHubHTTPError` | Incorrect repo name format | Use format `yourusername/your-repo-name` |
| `FileNotFoundError: Could not load pid_docid_map from index!` | Loading incompatible older index | Rebuild the index with the current RAGatouille version |
Compatibility Notes
- Windows: Not supported. RAGatouille does not work outside WSL. WSL1 has known issues. WSL2 has been reported to work by some users.
- Scripts: Code must run inside `if __name__ == "__main__"` guard when running as a script (required by colbert-ai's multiprocessing).
- LlamaIndex versions: The preprocessor handles both old (`llama_index.text_splitter`) and new (`llama_index.core.text_splitter`) import paths via try/except.
Related Pages
- Implementation:AnswerDotAI_RAGatouille_RAGPretrainedModel_Index
- Implementation:AnswerDotAI_RAGatouille_RAGPretrainedModel_Search
- Implementation:AnswerDotAI_RAGatouille_RAGPretrainedModel_From_Pretrained
- Implementation:AnswerDotAI_RAGatouille_RAGPretrainedModel_From_Index
- Implementation:AnswerDotAI_RAGatouille_RAGPretrainedModel_Rerank
- Implementation:AnswerDotAI_RAGatouille_RAGPretrainedModel_Encode
- Implementation:AnswerDotAI_RAGatouille_RAGPretrainedModel_Search_Encoded_Docs
- Implementation:AnswerDotAI_RAGatouille_RAGPretrainedModel_Add_To_Index
- Implementation:AnswerDotAI_RAGatouille_RAGTrainer_Init
- Implementation:AnswerDotAI_RAGatouille_RAGTrainer_Prepare_Training_Data
- Implementation:AnswerDotAI_RAGatouille_RAGTrainer_Train
- Implementation:AnswerDotAI_RAGatouille_ColBERTConfig_Training
- Implementation:AnswerDotAI_RAGatouille_SimpleMiner_Mine_Hard_Negatives
- Implementation:AnswerDotAI_RAGatouille_Export_To_Huggingface_Hub
- Implementation:AnswerDotAI_RAGatouille_Export_To_Vespa_ONNX