Environment:AnswerDotAI RAGatouille GPU CUDA Runtime

Knowledge Sources	RAGatouille NVIDIA CUDA
Domains	Infrastructure, GPU_Computing, Information_Retrieval
Last Updated	2026-02-12 12:00 GMT

Overview

Optional NVIDIA GPU environment with CUDA support for accelerated indexing, search, and training operations in RAGatouille.

Description

This environment extends the base Python dependencies with GPU acceleration via CUDA. RAGatouille is designed to work on both CPU and GPU, with automatic GPU detection via torch.cuda.is_available() and torch.cuda.device_count(). When a GPU is available, operations like document encoding, index building (KMeans clustering), ColBERT scoring, and model training are dispatched to the GPU for significant speedups. The GPU environment also enables the use of faiss-gpu instead of faiss-cpu for faster FAISS-based indexing on large collections (>75k documents).

Usage

Use this environment when working with large document collections (>10k documents), when training or fine-tuning ColBERT models, or when low-latency search is required. CPU-only mode works but is substantially slower for indexing and training. The GPU is optional for small-scale search and reranking.

System Requirements

Category	Requirement	Notes
OS	Linux (Ubuntu 18.04+)	Windows not supported (WSL2 may work)
Hardware	NVIDIA GPU with CUDA support	Any modern NVIDIA GPU works; more VRAM helps with larger batch sizes
Driver	NVIDIA driver compatible with CUDA toolkit	Check with `nvidia-smi`
CUDA	Compatible with PyTorch >= 1.13	PyTorch handles CUDA version matching

Dependencies

System Packages

NVIDIA GPU driver (compatible with chosen CUDA version)
CUDA toolkit (via PyTorch's bundled CUDA)

Python Packages

`torch` >= 1.13 (with CUDA support)
`faiss-gpu` — optional, for GPU-accelerated FAISS indexing on large collections

Credentials

No additional credentials required beyond the base Python environment.

Quick Install

# Install PyTorch with CUDA support (example for CUDA 11.8)
pip install torch --index-url https://download.pytorch.org/whl/cu118

# Optional: Replace faiss-cpu with faiss-gpu for large collection indexing
pip uninstall -y faiss-cpu && pip install faiss-gpu

# Install RAGatouille
pip install RAGatouille

Code Evidence

Automatic GPU detection and count from `ragatouille/models/colbert.py:39-40`:

if n_gpu == -1:
    n_gpu = 1 if torch.cuda.device_count() == 0 else torch.cuda.device_count()

GPU dispatch for ColBERT scoring from `ragatouille/models/colbert.py:457-458`:

if ColBERTConfig().total_visible_gpus > 0:
    Q, D_padded, D_mask = Q.cuda(), D_padded.cuda(), D_mask.cuda()

FAISS GPU check and warning from `ragatouille/models/index.py:223-236`:

if torch.cuda.is_available():
    import faiss

    if not hasattr(faiss, "StandardGpuResources"):
        print(
            "WARNING! You have a GPU available, but only `faiss-cpu` is currently installed.\n",
            "This means that indexing will be slow. To make use of your GPU.\n"
            "Please install `faiss-gpu` by running:\n"
            "pip uninstall --y faiss-cpu & pip install faiss-gpu\n",
        )
        print("Will continue with CPU indexing in 5 seconds...")
        time.sleep(5)

KMeans GPU/CPU device selection from `ragatouille/models/torch_kmeans.py:35-36`:

device = torch.device("cuda" if use_gpu else "cpu")
sample = sample.to(device)

GPU half-precision optimization for centroids from `ragatouille/models/torch_kmeans.py:16-19`:

if self.use_gpu:
    centroids = centroids.half()
else:
    centroids = centroids.float()

Common Errors

Error Message	Cause	Solution
`WARNING! You have a GPU available, but only faiss-cpu is currently installed.`	faiss-gpu not installed while GPU is available	`pip uninstall -y faiss-cpu && pip install faiss-gpu`
`torch.cuda.is_available()` returns False	CUDA drivers not properly installed	Install NVIDIA drivers and PyTorch with CUDA support
CUDA out of memory during indexing	Insufficient GPU VRAM for batch size	Reduce `bsize` parameter or use `use_faiss=False` (PyTorch KMeans)

Compatibility Notes

CPU-only mode: RAGatouille works fully on CPU. When no GPU is detected, `n_gpu` defaults to 1 (CPU mode) and all tensor operations stay on CPU.
Multi-GPU: Setting `n_gpu=-1` (default) auto-detects all available GPUs via `torch.cuda.device_count()`.
faiss-gpu vs faiss-cpu: The code detects at runtime whether `faiss.StandardGpuResources` exists. If only faiss-cpu is installed with a GPU present, it warns and continues with CPU FAISS after a 5-second delay.
Half-precision centroids: On GPU, KMeans centroids are stored in float16 for memory efficiency. On CPU, they remain in float32.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment