Implementation:Deepset ai Haystack MetadataRouter

Overview

MetadataRouter is a Haystack component that routes documents or byte streams to different pipeline connections based on their metadata fields. Rules are defined as filter dictionaries following the Haystack metadata filtering syntax. Documents that do not match any rule are routed to an unmatched output.

Code Reference

Source file: haystack/components/routers/metadata_router.py, lines 14-138

Import:

from haystack.components.routers import MetadataRouter

Constructor

MetadataRouter(
    rules: dict[str, dict],
    output_type: type = list[Document]
)

Parameters:

rules (dict[str, dict], required): A dictionary where keys are output connection names and values are filter expression dictionaries. Each filter must include an operator key. Examples:
- Simple rule: {"en": {"field": "meta.language", "operator": "==", "value": "en"}}
- Compound rule:

{
    "recent": {
        "operator": "AND",
        "conditions": [
            {"field": "meta.created_at", "operator": ">=", "value": "2023-01-01"},
            {"field": "meta.created_at", "operator": "<", "value": "2024-01-01"}
        ]
    }
}

output_type (type, default list[Document]): The type of output. Set to list[ByteStream] when routing byte streams instead of documents.

Raises:

ValueError: If any rule does not contain an operator key.

Run Method

run(documents: list[Document] | list[ByteStream]) -> dict[str, list[Document] | list[ByteStream]]

Parameters:

documents (list[Document] | list[ByteStream], required): A list of Document or ByteStream objects to route based on their metadata.

I/O Contract

Direction	Name	Type	Description
Input	documents	list[ByteStream]	Documents or byte streams to route
Output	<rule_name>	list[ByteStream]	Objects matching each named rule
Output	unmatched	list[ByteStream]	Objects that matched no rule

Usage Examples

Basic Language Routing

from haystack import Document
from haystack.components.routers import MetadataRouter

docs = [
    Document(content="Paris is the capital of France.", meta={"language": "en"}),
    Document(content="Berlin ist die Hauptstadt von Deutschland.", meta={"language": "de"})
]

router = MetadataRouter(rules={
    "en": {"field": "meta.language", "operator": "==", "value": "en"}
})

result = router.run(documents=docs)
# result["en"]: [Document about Paris]
# result["unmatched"]: [Document about Berlin]

Multi-Rule Routing

from haystack import Document
from haystack.components.routers import MetadataRouter

router = MetadataRouter(rules={
    "en": {"field": "meta.language", "operator": "==", "value": "en"},
    "de": {"field": "meta.language", "operator": "==", "value": "de"},
    "fr": {"field": "meta.language", "operator": "==", "value": "fr"}
})

Compound Conditions (Date Range Routing)

from haystack.components.routers import MetadataRouter

router = MetadataRouter(rules={
    "q1_2023": {
        "operator": "AND",
        "conditions": [
            {"field": "meta.created_at", "operator": ">=", "value": "2023-01-01"},
            {"field": "meta.created_at", "operator": "<", "value": "2023-04-01"}
        ]
    },
    "q2_2023": {
        "operator": "AND",
        "conditions": [
            {"field": "meta.created_at", "operator": ">=", "value": "2023-04-01"},
            {"field": "meta.created_at", "operator": "<", "value": "2023-07-01"}
        ]
    }
})

Routing ByteStreams

from haystack.dataclasses import ByteStream
from haystack.components.routers import MetadataRouter

streams = [
    ByteStream.from_string("Hello world", meta={"language": "en"}),
    ByteStream.from_string("Bonjour le monde", meta={"language": "fr"})
]

router = MetadataRouter(
    rules={"english": {"field": "meta.language", "operator": "==", "value": "en"}},
    output_type=list[ByteStream]
)

result = router.run(documents=streams)
# result["english"]: [ByteStream with "Hello world"]
# result["unmatched"]: [ByteStream with "Bonjour le monde"]

Pipeline with Language Classification and Routing

from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.classifiers import DocumentLanguageClassifier
from haystack.components.routers import MetadataRouter
from haystack.components.writers import DocumentWriter

docs = [
    Document(content="This is an English document"),
    Document(content="Dies ist ein deutsches Dokument")
]

en_store = InMemoryDocumentStore()
de_store = InMemoryDocumentStore()

pipeline = Pipeline()
pipeline.add_component("classifier", DocumentLanguageClassifier(languages=["en", "de"]))
pipeline.add_component("router", MetadataRouter(rules={
    "en": {"field": "meta.language", "operator": "==", "value": "en"},
    "de": {"field": "meta.language", "operator": "==", "value": "de"}
}))
pipeline.add_component("en_writer", DocumentWriter(document_store=en_store))
pipeline.add_component("de_writer", DocumentWriter(document_store=de_store))

pipeline.connect("classifier.documents", "router.documents")
pipeline.connect("router.en", "en_writer.documents")
pipeline.connect("router.de", "de_writer.documents")

pipeline.run({"classifier": {"documents": docs}})

Related Pages

Implements Principle

Principle:Deepset_ai_Haystack_Metadata_Based_Routing

Deepset_ai_Haystack_Metadata_Based_Routing - The principle behind metadata-based routing
Deepset_ai_Haystack_FileTypeRouter - Routes by file MIME type rather than metadata
Deepset_ai_Haystack_DocumentLanguageClassifier - Classifies document language for routing
Deepset_ai_Haystack_Document_Language_Classification - Principle of language classification

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment