Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepset ai Haystack DocumentWriter

From Leeroopedia
Knowledge Sources
Domains Data_Storage, ETL
Last Updated 2026-02-11 00:00 GMT

Overview

Concrete tool for writing documents to a document store provided by the Haystack framework.

Description

The DocumentWriter component persists a list of Document objects into any Haystack-compatible DocumentStore. It delegates the actual write operation to the store's write_documents method, passing along a DuplicatePolicy that controls how existing documents with matching IDs are handled. It supports both synchronous and asynchronous execution.

Usage

Import this class as the final component in indexing pipelines, after embedding or preprocessing steps. Connect the output of an embedder or splitter to the writer's documents input.

Code Reference

Source Location

  • Repository: haystack
  • File: haystack/components/writers/document_writer.py
  • Lines: L12-128

Signature

class DocumentWriter:
    def __init__(
        self,
        document_store: DocumentStore,
        policy: DuplicatePolicy = DuplicatePolicy.NONE,
    ):
        """
        Args:
            document_store: The target document store instance.
            policy: Duplicate handling policy (NONE, SKIP, OVERWRITE, FAIL).
        """

    def run(
        self,
        documents: list[Document],
        policy: DuplicatePolicy | None = None,
    ) -> dict[str, int]:
        """
        Args:
            documents: Documents to write.
            policy: Optional runtime override for duplicate policy.
        Returns:
            {"documents_written": int}
        """

Import

from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy

I/O Contract

Inputs

Name Type Required Description
documents list[Document] Yes Documents to persist to the store
policy DuplicatePolicy or None No Runtime override for duplicate handling

Outputs

Name Type Description
documents_written int Number of documents successfully written

Usage Examples

Basic Document Writing

from haystack import Document
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore

doc_store = InMemoryDocumentStore()
writer = DocumentWriter(document_store=doc_store)

docs = [
    Document(content="Python is a popular programming language"),
    Document(content="Java is used in enterprise applications"),
]

result = writer.run(documents=docs)
print(f"Documents written: {result['documents_written']}")

With Skip Duplicate Policy

from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy
from haystack.document_stores.in_memory import InMemoryDocumentStore

doc_store = InMemoryDocumentStore()
writer = DocumentWriter(document_store=doc_store, policy=DuplicatePolicy.SKIP)

# Second write of same documents will skip duplicates
writer.run(documents=docs)
result = writer.run(documents=docs)
print(f"Documents written on second run: {result['documents_written']}")

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment