Implementation:Run llama Llama index DocstoreStrategy Configuration
| Knowledge Sources | |
|---|---|
| Domains | Data_Ingestion, RAG, Data_Management |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
The DocstoreStrategy enum defines the deduplication behavior for the IngestionPipeline when a docstore is configured, controlling how duplicate, changed, and deleted documents are handled.
Description
DocstoreStrategy is a string enum with three members that determine the pipeline's behavior when it encounters documents that already exist in the docstore. The strategy is passed to the IngestionPipeline constructor and is evaluated during the run() method when a docstore is present.
The three strategies represent increasing levels of synchronization between the source documents and the stored data:
- UPSERTS: Insert or update, never delete
- DUPLICATES_ONLY: Skip exact duplicates, insert everything else as new
- UPSERTS_AND_DELETE: Full sync including deletions
Usage
Select a DocstoreStrategy based on your operational requirements. For most incremental ingestion workflows, UPSERTS (the default) is sufficient. Use UPSERTS_AND_DELETE when the store must exactly mirror the source.
Code Reference
Source Location
- Repository: llama_index
- File: llama-index-core/llama_index/core/ingestion/pipeline.py
- Lines: L185-202
Signature
class DocstoreStrategy(str, Enum):
"""Document store strategy for handling duplicates."""
UPSERTS = "upserts"
DUPLICATES_ONLY = "duplicates_only"
UPSERTS_AND_DELETE = "upserts_and_delete"
Import
from llama_index.core.ingestion import DocstoreStrategy
I/O Contract
Enum Members
| Member | Value | Description |
|---|---|---|
| UPSERTS | "upserts" | Insert new documents and update changed ones; do not delete removed documents |
| DUPLICATES_ONLY | "duplicates_only" | Skip documents with identical content; insert changed documents as new entries |
| UPSERTS_AND_DELETE | "upserts_and_delete" | Insert new, update changed, and delete documents no longer present in the input batch |
Usage Examples
Configuring Upsert Strategy
from llama_index.core.ingestion import IngestionPipeline, DocstoreStrategy
from llama_index.core.storage.docstore import SimpleDocumentStore
pipeline = IngestionPipeline(
transformations=[...],
docstore=SimpleDocumentStore(),
docstore_strategy=DocstoreStrategy.UPSERTS,
)
Full Sync with Deletions
from llama_index.core.ingestion import IngestionPipeline, DocstoreStrategy
from llama_index.core.storage.docstore import SimpleDocumentStore
pipeline = IngestionPipeline(
transformations=[...],
docstore=SimpleDocumentStore(),
docstore_strategy=DocstoreStrategy.UPSERTS_AND_DELETE,
)
# All documents must be provided on each run for deletion detection
nodes = pipeline.run(documents=all_current_documents)