Implementation:AnswerDotAI RAGatouille RAGPretrainedModel Add To Index

Knowledge Sources	RAGatouille RAGatouille Docs
Domains	NLP, Information_Retrieval, Index_Management
Last Updated	2026-02-12 12:00 GMT

Overview

Concrete tool for dynamically adding documents to and removing documents from a PLAID index provided by the RAGatouille library.

Description

The RAGPretrainedModel.add_to_index() and RAGPretrainedModel.delete_from_index() methods provide dynamic index management. add_to_index() processes new documents through the corpus pipeline, deduplicates against existing documents, and delegates to ColBERT.add_to_index() which uses PLAIDModelIndex.add() to either incrementally update or fully rebuild the index. delete_from_index() maps document IDs to passage IDs and uses PLAIDModelIndex.delete() to remove them via the colbert-ai IndexUpdater.

Both operations are currently marked as experimental and update all on-disk metadata files after modification.

Usage

Use these methods when you need to modify an existing index. Requires a previously built index (via index()) or a loaded one (via from_index()).

Code Reference

Source Location

Repository: RAGatouille
File: ragatouille/RAGPretrainedModel.py
Lines: L222-281 (add_to_index: L222-265, delete_from_index: L267-281)

Signature

def add_to_index(
    self,
    new_collection: list[str],
    new_document_ids: Optional[Union[TypeVar("T"), List[TypeVar("T")]]] = None,
    new_document_metadatas: Optional[list[dict]] = None,
    index_name: Optional[str] = None,
    split_documents: bool = True,
    document_splitter_fn: Optional[Callable] = llama_index_sentence_splitter,
    preprocessing_fn: Optional[Union[Callable, list[Callable]]] = None,
    bsize: int = 32,
    use_faiss: bool = False,
) -> None:
    """Add documents to an existing index."""

def delete_from_index(
    self,
    document_ids: Union[TypeVar("T"), List[TypeVar("T")]],
    index_name: Optional[str] = None,
) -> None:
    """Delete documents from an index by their IDs."""

Import

from ragatouille import RAGPretrainedModel

I/O Contract

Inputs (add_to_index)

Name	Type	Required	Description
new_collection	list[str]	Yes	New documents to add to the index
new_document_ids	Optional[Union[T, List[T]]]	No	Optional IDs for new documents
new_document_metadatas	Optional[list[dict]]	No	Optional metadata for new documents
index_name	Optional[str]	No	Target index name. Uses current index if None
split_documents	bool	No	Whether to split documents (default True)
document_splitter_fn	Optional[Callable]	No	Splitter function (default: llama_index_sentence_splitter)
preprocessing_fn	Optional[Union[Callable, list[Callable]]]	No	Optional preprocessing
bsize	int	No	Encoding batch size (default 32)
use_faiss	bool	No	Use FAISS for KMeans (default False)

Inputs (delete_from_index)

Name	Type	Required	Description
document_ids	Union[T, List[T]]	Yes	IDs of documents to remove from the index
index_name	Optional[str]	No	Target index name. Uses current index if None

Outputs

Name	Type	Description
add_to_index returns	None	Side-effect: index updated on disk with new documents
delete_from_index returns	None	Side-effect: documents removed from index on disk

Usage Examples

Adding Documents to an Existing Index

from ragatouille import RAGPretrainedModel

RAG = RAGPretrainedModel.from_index(".ragatouille/colbert/indexes/my_index")

# Add new documents
RAG.add_to_index(
    new_collection=["A new document to add.", "Another new document."],
    new_document_ids=["new_doc_1", "new_doc_2"],
)

Deleting Documents from an Index

from ragatouille import RAGPretrainedModel

RAG = RAGPretrainedModel.from_index(".ragatouille/colbert/indexes/my_index")

# Remove documents by their IDs
RAG.delete_from_index(document_ids=["doc_to_remove_1", "doc_to_remove_2"])

Adding Documents with Metadata

RAG.add_to_index(
    new_collection=["New document with metadata."],
    new_document_ids=["meta_doc_1"],
    new_document_metadatas=[{"source": "api", "timestamp": "2024-01-15"}],
    bsize=64,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment