Implementation:AnswerDotAI RAGatouille RAGPretrainedModel Add To Index
| Knowledge Sources | |
|---|---|
| Domains | NLP, Information_Retrieval, Index_Management |
| Last Updated | 2026-02-12 12:00 GMT |
Overview
Concrete tool for dynamically adding documents to and removing documents from a PLAID index provided by the RAGatouille library.
Description
The RAGPretrainedModel.add_to_index() and RAGPretrainedModel.delete_from_index() methods provide dynamic index management. add_to_index() processes new documents through the corpus pipeline, deduplicates against existing documents, and delegates to ColBERT.add_to_index() which uses PLAIDModelIndex.add() to either incrementally update or fully rebuild the index. delete_from_index() maps document IDs to passage IDs and uses PLAIDModelIndex.delete() to remove them via the colbert-ai IndexUpdater.
Both operations are currently marked as experimental and update all on-disk metadata files after modification.
Usage
Use these methods when you need to modify an existing index. Requires a previously built index (via index()) or a loaded one (via from_index()).
Code Reference
Source Location
- Repository: RAGatouille
- File: ragatouille/RAGPretrainedModel.py
- Lines: L222-281 (add_to_index: L222-265, delete_from_index: L267-281)
Signature
def add_to_index(
self,
new_collection: list[str],
new_document_ids: Optional[Union[TypeVar("T"), List[TypeVar("T")]]] = None,
new_document_metadatas: Optional[list[dict]] = None,
index_name: Optional[str] = None,
split_documents: bool = True,
document_splitter_fn: Optional[Callable] = llama_index_sentence_splitter,
preprocessing_fn: Optional[Union[Callable, list[Callable]]] = None,
bsize: int = 32,
use_faiss: bool = False,
) -> None:
"""Add documents to an existing index."""
def delete_from_index(
self,
document_ids: Union[TypeVar("T"), List[TypeVar("T")]],
index_name: Optional[str] = None,
) -> None:
"""Delete documents from an index by their IDs."""
Import
from ragatouille import RAGPretrainedModel
I/O Contract
Inputs (add_to_index)
| Name | Type | Required | Description |
|---|---|---|---|
| new_collection | list[str] | Yes | New documents to add to the index |
| new_document_ids | Optional[Union[T, List[T]]] | No | Optional IDs for new documents |
| new_document_metadatas | Optional[list[dict]] | No | Optional metadata for new documents |
| index_name | Optional[str] | No | Target index name. Uses current index if None |
| split_documents | bool | No | Whether to split documents (default True) |
| document_splitter_fn | Optional[Callable] | No | Splitter function (default: llama_index_sentence_splitter) |
| preprocessing_fn | Optional[Union[Callable, list[Callable]]] | No | Optional preprocessing |
| bsize | int | No | Encoding batch size (default 32) |
| use_faiss | bool | No | Use FAISS for KMeans (default False) |
Inputs (delete_from_index)
| Name | Type | Required | Description |
|---|---|---|---|
| document_ids | Union[T, List[T]] | Yes | IDs of documents to remove from the index |
| index_name | Optional[str] | No | Target index name. Uses current index if None |
Outputs
| Name | Type | Description |
|---|---|---|
| add_to_index returns | None | Side-effect: index updated on disk with new documents |
| delete_from_index returns | None | Side-effect: documents removed from index on disk |
Usage Examples
Adding Documents to an Existing Index
from ragatouille import RAGPretrainedModel
RAG = RAGPretrainedModel.from_index(".ragatouille/colbert/indexes/my_index")
# Add new documents
RAG.add_to_index(
new_collection=["A new document to add.", "Another new document."],
new_document_ids=["new_doc_1", "new_doc_2"],
)
Deleting Documents from an Index
from ragatouille import RAGPretrainedModel
RAG = RAGPretrainedModel.from_index(".ragatouille/colbert/indexes/my_index")
# Remove documents by their IDs
RAG.delete_from_index(document_ids=["doc_to_remove_1", "doc_to_remove_2"])
Adding Documents with Metadata
RAG.add_to_index(
new_collection=["New document with metadata."],
new_document_ids=["meta_doc_1"],
new_document_metadatas=[{"source": "api", "timestamp": "2024-01-15"}],
bsize=64,
)