Implementation:Deepset ai Haystack DocumentJoiner
Appearance
Overview
DocumentJoiner is a Haystack component that joins multiple lists of documents into a single list using configurable fusion strategies. It supports concatenation, weighted merge, reciprocal rank fusion, and distribution-based rank fusion modes. This component is essential for hybrid retrieval pipelines that combine results from multiple retriever components.
Code Reference
Source file: haystack/components/joiners/document_joiner.py, lines 44-161
Import:
from haystack.components.joiners import DocumentJoiner
Constructor
DocumentJoiner(
join_mode: str | JoinMode = "concatenate",
weights: list[float] | None = None,
top_k: int | None = None,
sort_by_score: bool = True
)
Parameters:
join_mode(str | JoinMode, default"concatenate"): The strategy for joining document lists. Options:"concatenate": Keeps the highest-scored document in case of duplicates."merge": Calculates a weighted sum of scores for duplicate documents."reciprocal_rank_fusion": Assigns scores based on reciprocal rank fusion."distribution_based_rank_fusion": Normalizes scores based on score distributions in each retriever.
weights(list[float] | None, defaultNone): Weights to assign importance to each document list. Ignored forconcatenateanddistribution_based_rank_fusionmodes. Weights are normalized to sum to 1.0.top_k(int | None, defaultNone): Maximum number of documents to return.sort_by_score(bool, defaultTrue): IfTrue, sorts documents by score in descending order. Documents withNonescore are treated as having score of negative infinity.
Run Method
run(
documents: Variadic[list[Document]],
top_k: int | None = None
) -> {"documents": list[Document]}
Parameters:
documents(Variadic[list[Document]], required): Multiple lists of documents to be merged. Uses Haystack'sVariadictype to accept a variable number of inputs.top_k(int | None, defaultNone): Maximum number of documents to return. Overrides the instance'stop_kif provided.
I/O Contract
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | documents | Variadic[list[Document]] | Multiple document lists from different retrievers |
| Input | top_k | None | Optional override for maximum documents to return |
| Output | documents | list[Document] | Merged and scored list of documents |
Usage Examples
Basic Concatenation
from haystack import Document
from haystack.components.joiners import DocumentJoiner
joiner = DocumentJoiner(join_mode="concatenate")
docs_a = [Document(content="Paris", score=0.9), Document(content="Berlin", score=0.7)]
docs_b = [Document(content="London", score=0.8), Document(content="Paris", score=0.6)]
# In a pipeline context, multiple inputs connect to the joiner
Reciprocal Rank Fusion
from haystack.components.joiners import DocumentJoiner
joiner = DocumentJoiner(join_mode="reciprocal_rank_fusion", top_k=10)
Weighted Merge
from haystack.components.joiners import DocumentJoiner
# Give 70% weight to semantic search results and 30% to keyword search
joiner = DocumentJoiner(join_mode="merge", weights=[0.7, 0.3])
Hybrid Retrieval Pipeline
from haystack import Pipeline
from haystack.components.joiners import DocumentJoiner
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.retrievers import InMemoryBM25Retriever, InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
document_store = InMemoryDocumentStore()
pipeline = Pipeline()
pipeline.add_component("bm25_retriever", InMemoryBM25Retriever(document_store=document_store))
pipeline.add_component(
"text_embedder",
SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
)
pipeline.add_component("embedding_retriever", InMemoryEmbeddingRetriever(document_store=document_store))
pipeline.add_component("joiner", DocumentJoiner(join_mode="reciprocal_rank_fusion"))
pipeline.connect("bm25_retriever", "joiner")
pipeline.connect("text_embedder", "embedding_retriever")
pipeline.connect("embedding_retriever", "joiner")
query = "What is the capital of France?"
result = pipeline.run(data={"query": query, "text": query, "top_k": 5})
Related Pages
Implements Principle
- Deepset_ai_Haystack_Document_Joining - The principle behind document joining and fusion strategies
- Deepset_ai_Haystack_DocumentSplitter - Splits documents before retrieval
- Deepset_ai_Haystack_MetadataRouter - Routes documents based on metadata fields
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment