Implementation:Deepset ai Haystack DocumentWriter
| Knowledge Sources | |
|---|---|
| Domains | Data_Storage, ETL |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
Concrete tool for writing documents to a document store provided by the Haystack framework.
Description
The DocumentWriter component persists a list of Document objects into any Haystack-compatible DocumentStore. It delegates the actual write operation to the store's write_documents method, passing along a DuplicatePolicy that controls how existing documents with matching IDs are handled. It supports both synchronous and asynchronous execution.
Usage
Import this class as the final component in indexing pipelines, after embedding or preprocessing steps. Connect the output of an embedder or splitter to the writer's documents input.
Code Reference
Source Location
- Repository: haystack
- File: haystack/components/writers/document_writer.py
- Lines: L12-128
Signature
class DocumentWriter:
def __init__(
self,
document_store: DocumentStore,
policy: DuplicatePolicy = DuplicatePolicy.NONE,
):
"""
Args:
document_store: The target document store instance.
policy: Duplicate handling policy (NONE, SKIP, OVERWRITE, FAIL).
"""
def run(
self,
documents: list[Document],
policy: DuplicatePolicy | None = None,
) -> dict[str, int]:
"""
Args:
documents: Documents to write.
policy: Optional runtime override for duplicate policy.
Returns:
{"documents_written": int}
"""
Import
from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| documents | list[Document] | Yes | Documents to persist to the store |
| policy | DuplicatePolicy or None | No | Runtime override for duplicate handling |
Outputs
| Name | Type | Description |
|---|---|---|
| documents_written | int | Number of documents successfully written |
Usage Examples
Basic Document Writing
from haystack import Document
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
doc_store = InMemoryDocumentStore()
writer = DocumentWriter(document_store=doc_store)
docs = [
Document(content="Python is a popular programming language"),
Document(content="Java is used in enterprise applications"),
]
result = writer.run(documents=docs)
print(f"Documents written: {result['documents_written']}")
With Skip Duplicate Policy
from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy
from haystack.document_stores.in_memory import InMemoryDocumentStore
doc_store = InMemoryDocumentStore()
writer = DocumentWriter(document_store=doc_store, policy=DuplicatePolicy.SKIP)
# Second write of same documents will skip duplicates
writer.run(documents=docs)
result = writer.run(documents=docs)
print(f"Documents written on second run: {result['documents_written']}")