Environment:AnswerDotAI RAGatouille GPU CUDA Runtime
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, GPU_Computing, Information_Retrieval |
| Last Updated | 2026-02-12 12:00 GMT |
Overview
Optional NVIDIA GPU environment with CUDA support for accelerated indexing, search, and training operations in RAGatouille.
Description
This environment extends the base Python dependencies with GPU acceleration via CUDA. RAGatouille is designed to work on both CPU and GPU, with automatic GPU detection via torch.cuda.is_available() and torch.cuda.device_count(). When a GPU is available, operations like document encoding, index building (KMeans clustering), ColBERT scoring, and model training are dispatched to the GPU for significant speedups. The GPU environment also enables the use of faiss-gpu instead of faiss-cpu for faster FAISS-based indexing on large collections (>75k documents).
Usage
Use this environment when working with large document collections (>10k documents), when training or fine-tuning ColBERT models, or when low-latency search is required. CPU-only mode works but is substantially slower for indexing and training. The GPU is optional for small-scale search and reranking.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu 18.04+) | Windows not supported (WSL2 may work) |
| Hardware | NVIDIA GPU with CUDA support | Any modern NVIDIA GPU works; more VRAM helps with larger batch sizes |
| Driver | NVIDIA driver compatible with CUDA toolkit | Check with `nvidia-smi` |
| CUDA | Compatible with PyTorch >= 1.13 | PyTorch handles CUDA version matching |
Dependencies
System Packages
- NVIDIA GPU driver (compatible with chosen CUDA version)
- CUDA toolkit (via PyTorch's bundled CUDA)
Python Packages
- `torch` >= 1.13 (with CUDA support)
- `faiss-gpu` — optional, for GPU-accelerated FAISS indexing on large collections
Credentials
No additional credentials required beyond the base Python environment.
Quick Install
# Install PyTorch with CUDA support (example for CUDA 11.8)
pip install torch --index-url https://download.pytorch.org/whl/cu118
# Optional: Replace faiss-cpu with faiss-gpu for large collection indexing
pip uninstall -y faiss-cpu && pip install faiss-gpu
# Install RAGatouille
pip install RAGatouille
Code Evidence
Automatic GPU detection and count from `ragatouille/models/colbert.py:39-40`:
if n_gpu == -1:
n_gpu = 1 if torch.cuda.device_count() == 0 else torch.cuda.device_count()
GPU dispatch for ColBERT scoring from `ragatouille/models/colbert.py:457-458`:
if ColBERTConfig().total_visible_gpus > 0:
Q, D_padded, D_mask = Q.cuda(), D_padded.cuda(), D_mask.cuda()
FAISS GPU check and warning from `ragatouille/models/index.py:223-236`:
if torch.cuda.is_available():
import faiss
if not hasattr(faiss, "StandardGpuResources"):
print(
"WARNING! You have a GPU available, but only `faiss-cpu` is currently installed.\n",
"This means that indexing will be slow. To make use of your GPU.\n"
"Please install `faiss-gpu` by running:\n"
"pip uninstall --y faiss-cpu & pip install faiss-gpu\n",
)
print("Will continue with CPU indexing in 5 seconds...")
time.sleep(5)
KMeans GPU/CPU device selection from `ragatouille/models/torch_kmeans.py:35-36`:
device = torch.device("cuda" if use_gpu else "cpu")
sample = sample.to(device)
GPU half-precision optimization for centroids from `ragatouille/models/torch_kmeans.py:16-19`:
if self.use_gpu:
centroids = centroids.half()
else:
centroids = centroids.float()
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `WARNING! You have a GPU available, but only faiss-cpu is currently installed.` | faiss-gpu not installed while GPU is available | `pip uninstall -y faiss-cpu && pip install faiss-gpu` |
| `torch.cuda.is_available()` returns False | CUDA drivers not properly installed | Install NVIDIA drivers and PyTorch with CUDA support |
| CUDA out of memory during indexing | Insufficient GPU VRAM for batch size | Reduce `bsize` parameter or use `use_faiss=False` (PyTorch KMeans) |
Compatibility Notes
- CPU-only mode: RAGatouille works fully on CPU. When no GPU is detected, `n_gpu` defaults to 1 (CPU mode) and all tensor operations stay on CPU.
- Multi-GPU: Setting `n_gpu=-1` (default) auto-detects all available GPUs via `torch.cuda.device_count()`.
- faiss-gpu vs faiss-cpu: The code detects at runtime whether `faiss.StandardGpuResources` exists. If only faiss-cpu is installed with a GPU present, it warns and continues with CPU FAISS after a 5-second delay.
- Half-precision centroids: On GPU, KMeans centroids are stored in float16 for memory efficiency. On CPU, they remain in float32.
Related Pages
- Implementation:AnswerDotAI_RAGatouille_RAGPretrainedModel_Index
- Implementation:AnswerDotAI_RAGatouille_RAGPretrainedModel_Search
- Implementation:AnswerDotAI_RAGatouille_RAGTrainer_Train
- Implementation:AnswerDotAI_RAGatouille_RAGPretrainedModel_Rerank
- Implementation:AnswerDotAI_RAGatouille_RAGPretrainedModel_Encode