Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Run llama Llama index DocstoreStrategy Configuration

From Leeroopedia
Knowledge Sources
Domains Data_Ingestion, RAG, Data_Management
Last Updated 2026-02-11 00:00 GMT

Overview

The DocstoreStrategy enum defines the deduplication behavior for the IngestionPipeline when a docstore is configured, controlling how duplicate, changed, and deleted documents are handled.

Description

DocstoreStrategy is a string enum with three members that determine the pipeline's behavior when it encounters documents that already exist in the docstore. The strategy is passed to the IngestionPipeline constructor and is evaluated during the run() method when a docstore is present.

The three strategies represent increasing levels of synchronization between the source documents and the stored data:

  • UPSERTS: Insert or update, never delete
  • DUPLICATES_ONLY: Skip exact duplicates, insert everything else as new
  • UPSERTS_AND_DELETE: Full sync including deletions

Usage

Select a DocstoreStrategy based on your operational requirements. For most incremental ingestion workflows, UPSERTS (the default) is sufficient. Use UPSERTS_AND_DELETE when the store must exactly mirror the source.

Code Reference

Source Location

  • Repository: llama_index
  • File: llama-index-core/llama_index/core/ingestion/pipeline.py
  • Lines: L185-202

Signature

class DocstoreStrategy(str, Enum):
    """Document store strategy for handling duplicates."""

    UPSERTS = "upserts"
    DUPLICATES_ONLY = "duplicates_only"
    UPSERTS_AND_DELETE = "upserts_and_delete"

Import

from llama_index.core.ingestion import DocstoreStrategy

I/O Contract

Enum Members

Member Value Description
UPSERTS "upserts" Insert new documents and update changed ones; do not delete removed documents
DUPLICATES_ONLY "duplicates_only" Skip documents with identical content; insert changed documents as new entries
UPSERTS_AND_DELETE "upserts_and_delete" Insert new, update changed, and delete documents no longer present in the input batch

Usage Examples

Configuring Upsert Strategy

from llama_index.core.ingestion import IngestionPipeline, DocstoreStrategy
from llama_index.core.storage.docstore import SimpleDocumentStore

pipeline = IngestionPipeline(
    transformations=[...],
    docstore=SimpleDocumentStore(),
    docstore_strategy=DocstoreStrategy.UPSERTS,
)

Full Sync with Deletions

from llama_index.core.ingestion import IngestionPipeline, DocstoreStrategy
from llama_index.core.storage.docstore import SimpleDocumentStore

pipeline = IngestionPipeline(
    transformations=[...],
    docstore=SimpleDocumentStore(),
    docstore_strategy=DocstoreStrategy.UPSERTS_AND_DELETE,
)

# All documents must be provided on each run for deletion detection
nodes = pipeline.run(documents=all_current_documents)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment