Implementation:Deepset ai Haystack MetadataRouter
Appearance
Overview
MetadataRouter is a Haystack component that routes documents or byte streams to different pipeline connections based on their metadata fields. Rules are defined as filter dictionaries following the Haystack metadata filtering syntax. Documents that do not match any rule are routed to an unmatched output.
Code Reference
Source file: haystack/components/routers/metadata_router.py, lines 14-138
Import:
from haystack.components.routers import MetadataRouter
Constructor
MetadataRouter(
rules: dict[str, dict],
output_type: type = list[Document]
)
Parameters:
rules(dict[str, dict], required): A dictionary where keys are output connection names and values are filter expression dictionaries. Each filter must include anoperatorkey. Examples:- Simple rule:
{"en": {"field": "meta.language", "operator": "==", "value": "en"}} - Compound rule:
- Simple rule:
{
"recent": {
"operator": "AND",
"conditions": [
{"field": "meta.created_at", "operator": ">=", "value": "2023-01-01"},
{"field": "meta.created_at", "operator": "<", "value": "2024-01-01"}
]
}
}
output_type(type, defaultlist[Document]): The type of output. Set tolist[ByteStream]when routing byte streams instead of documents.
Raises:
ValueError: If any rule does not contain anoperatorkey.
Run Method
run(documents: list[Document] | list[ByteStream]) -> dict[str, list[Document] | list[ByteStream]]
Parameters:
documents(list[Document] | list[ByteStream], required): A list of Document or ByteStream objects to route based on their metadata.
I/O Contract
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | documents | list[ByteStream] | Documents or byte streams to route |
| Output | <rule_name> | list[ByteStream] | Objects matching each named rule |
| Output | unmatched | list[ByteStream] | Objects that matched no rule |
Usage Examples
Basic Language Routing
from haystack import Document
from haystack.components.routers import MetadataRouter
docs = [
Document(content="Paris is the capital of France.", meta={"language": "en"}),
Document(content="Berlin ist die Hauptstadt von Deutschland.", meta={"language": "de"})
]
router = MetadataRouter(rules={
"en": {"field": "meta.language", "operator": "==", "value": "en"}
})
result = router.run(documents=docs)
# result["en"]: [Document about Paris]
# result["unmatched"]: [Document about Berlin]
Multi-Rule Routing
from haystack import Document
from haystack.components.routers import MetadataRouter
router = MetadataRouter(rules={
"en": {"field": "meta.language", "operator": "==", "value": "en"},
"de": {"field": "meta.language", "operator": "==", "value": "de"},
"fr": {"field": "meta.language", "operator": "==", "value": "fr"}
})
Compound Conditions (Date Range Routing)
from haystack.components.routers import MetadataRouter
router = MetadataRouter(rules={
"q1_2023": {
"operator": "AND",
"conditions": [
{"field": "meta.created_at", "operator": ">=", "value": "2023-01-01"},
{"field": "meta.created_at", "operator": "<", "value": "2023-04-01"}
]
},
"q2_2023": {
"operator": "AND",
"conditions": [
{"field": "meta.created_at", "operator": ">=", "value": "2023-04-01"},
{"field": "meta.created_at", "operator": "<", "value": "2023-07-01"}
]
}
})
Routing ByteStreams
from haystack.dataclasses import ByteStream
from haystack.components.routers import MetadataRouter
streams = [
ByteStream.from_string("Hello world", meta={"language": "en"}),
ByteStream.from_string("Bonjour le monde", meta={"language": "fr"})
]
router = MetadataRouter(
rules={"english": {"field": "meta.language", "operator": "==", "value": "en"}},
output_type=list[ByteStream]
)
result = router.run(documents=streams)
# result["english"]: [ByteStream with "Hello world"]
# result["unmatched"]: [ByteStream with "Bonjour le monde"]
Pipeline with Language Classification and Routing
from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.classifiers import DocumentLanguageClassifier
from haystack.components.routers import MetadataRouter
from haystack.components.writers import DocumentWriter
docs = [
Document(content="This is an English document"),
Document(content="Dies ist ein deutsches Dokument")
]
en_store = InMemoryDocumentStore()
de_store = InMemoryDocumentStore()
pipeline = Pipeline()
pipeline.add_component("classifier", DocumentLanguageClassifier(languages=["en", "de"]))
pipeline.add_component("router", MetadataRouter(rules={
"en": {"field": "meta.language", "operator": "==", "value": "en"},
"de": {"field": "meta.language", "operator": "==", "value": "de"}
}))
pipeline.add_component("en_writer", DocumentWriter(document_store=en_store))
pipeline.add_component("de_writer", DocumentWriter(document_store=de_store))
pipeline.connect("classifier.documents", "router.documents")
pipeline.connect("router.en", "en_writer.documents")
pipeline.connect("router.de", "de_writer.documents")
pipeline.run({"classifier": {"documents": docs}})
Related Pages
Implements Principle
- Deepset_ai_Haystack_Metadata_Based_Routing - The principle behind metadata-based routing
- Deepset_ai_Haystack_FileTypeRouter - Routes by file MIME type rather than metadata
- Deepset_ai_Haystack_DocumentLanguageClassifier - Classifies document language for routing
- Deepset_ai_Haystack_Document_Language_Classification - Principle of language classification
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment