Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:AnswerDotAI RAGatouille RAGPretrainedModel Add To Index

From Leeroopedia
Knowledge Sources
Domains NLP, Information_Retrieval, Index_Management
Last Updated 2026-02-12 12:00 GMT

Overview

Concrete tool for dynamically adding documents to and removing documents from a PLAID index provided by the RAGatouille library.

Description

The RAGPretrainedModel.add_to_index() and RAGPretrainedModel.delete_from_index() methods provide dynamic index management. add_to_index() processes new documents through the corpus pipeline, deduplicates against existing documents, and delegates to ColBERT.add_to_index() which uses PLAIDModelIndex.add() to either incrementally update or fully rebuild the index. delete_from_index() maps document IDs to passage IDs and uses PLAIDModelIndex.delete() to remove them via the colbert-ai IndexUpdater.

Both operations are currently marked as experimental and update all on-disk metadata files after modification.

Usage

Use these methods when you need to modify an existing index. Requires a previously built index (via index()) or a loaded one (via from_index()).

Code Reference

Source Location

  • Repository: RAGatouille
  • File: ragatouille/RAGPretrainedModel.py
  • Lines: L222-281 (add_to_index: L222-265, delete_from_index: L267-281)

Signature

def add_to_index(
    self,
    new_collection: list[str],
    new_document_ids: Optional[Union[TypeVar("T"), List[TypeVar("T")]]] = None,
    new_document_metadatas: Optional[list[dict]] = None,
    index_name: Optional[str] = None,
    split_documents: bool = True,
    document_splitter_fn: Optional[Callable] = llama_index_sentence_splitter,
    preprocessing_fn: Optional[Union[Callable, list[Callable]]] = None,
    bsize: int = 32,
    use_faiss: bool = False,
) -> None:
    """Add documents to an existing index."""

def delete_from_index(
    self,
    document_ids: Union[TypeVar("T"), List[TypeVar("T")]],
    index_name: Optional[str] = None,
) -> None:
    """Delete documents from an index by their IDs."""

Import

from ragatouille import RAGPretrainedModel

I/O Contract

Inputs (add_to_index)

Name Type Required Description
new_collection list[str] Yes New documents to add to the index
new_document_ids Optional[Union[T, List[T]]] No Optional IDs for new documents
new_document_metadatas Optional[list[dict]] No Optional metadata for new documents
index_name Optional[str] No Target index name. Uses current index if None
split_documents bool No Whether to split documents (default True)
document_splitter_fn Optional[Callable] No Splitter function (default: llama_index_sentence_splitter)
preprocessing_fn Optional[Union[Callable, list[Callable]]] No Optional preprocessing
bsize int No Encoding batch size (default 32)
use_faiss bool No Use FAISS for KMeans (default False)

Inputs (delete_from_index)

Name Type Required Description
document_ids Union[T, List[T]] Yes IDs of documents to remove from the index
index_name Optional[str] No Target index name. Uses current index if None

Outputs

Name Type Description
add_to_index returns None Side-effect: index updated on disk with new documents
delete_from_index returns None Side-effect: documents removed from index on disk

Usage Examples

Adding Documents to an Existing Index

from ragatouille import RAGPretrainedModel

RAG = RAGPretrainedModel.from_index(".ragatouille/colbert/indexes/my_index")

# Add new documents
RAG.add_to_index(
    new_collection=["A new document to add.", "Another new document."],
    new_document_ids=["new_doc_1", "new_doc_2"],
)

Deleting Documents from an Index

from ragatouille import RAGPretrainedModel

RAG = RAGPretrainedModel.from_index(".ragatouille/colbert/indexes/my_index")

# Remove documents by their IDs
RAG.delete_from_index(document_ids=["doc_to_remove_1", "doc_to_remove_2"])

Adding Documents with Metadata

RAG.add_to_index(
    new_collection=["New document with metadata."],
    new_document_ids=["meta_doc_1"],
    new_document_metadatas=[{"source": "api", "timestamp": "2024-01-15"}],
    bsize=64,
)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment