Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepset ai Haystack MetadataRouter

From Leeroopedia

Template:Metadata

Overview

MetadataRouter is a Haystack component that routes documents or byte streams to different pipeline connections based on their metadata fields. Rules are defined as filter dictionaries following the Haystack metadata filtering syntax. Documents that do not match any rule are routed to an unmatched output.

Code Reference

Source file: haystack/components/routers/metadata_router.py, lines 14-138

Import:

from haystack.components.routers import MetadataRouter

Constructor

MetadataRouter(
    rules: dict[str, dict],
    output_type: type = list[Document]
)

Parameters:

  • rules (dict[str, dict], required): A dictionary where keys are output connection names and values are filter expression dictionaries. Each filter must include an operator key. Examples:
    • Simple rule: {"en": {"field": "meta.language", "operator": "==", "value": "en"}}
    • Compound rule:
{
    "recent": {
        "operator": "AND",
        "conditions": [
            {"field": "meta.created_at", "operator": ">=", "value": "2023-01-01"},
            {"field": "meta.created_at", "operator": "<", "value": "2024-01-01"}
        ]
    }
}
  • output_type (type, default list[Document]): The type of output. Set to list[ByteStream] when routing byte streams instead of documents.

Raises:

  • ValueError: If any rule does not contain an operator key.

Run Method

run(documents: list[Document] | list[ByteStream]) -> dict[str, list[Document] | list[ByteStream]]

Parameters:

  • documents (list[Document] | list[ByteStream], required): A list of Document or ByteStream objects to route based on their metadata.

I/O Contract

Direction Name Type Description
Input documents list[ByteStream] Documents or byte streams to route
Output <rule_name> list[ByteStream] Objects matching each named rule
Output unmatched list[ByteStream] Objects that matched no rule

Usage Examples

Basic Language Routing

from haystack import Document
from haystack.components.routers import MetadataRouter

docs = [
    Document(content="Paris is the capital of France.", meta={"language": "en"}),
    Document(content="Berlin ist die Hauptstadt von Deutschland.", meta={"language": "de"})
]

router = MetadataRouter(rules={
    "en": {"field": "meta.language", "operator": "==", "value": "en"}
})

result = router.run(documents=docs)
# result["en"]: [Document about Paris]
# result["unmatched"]: [Document about Berlin]

Multi-Rule Routing

from haystack import Document
from haystack.components.routers import MetadataRouter

router = MetadataRouter(rules={
    "en": {"field": "meta.language", "operator": "==", "value": "en"},
    "de": {"field": "meta.language", "operator": "==", "value": "de"},
    "fr": {"field": "meta.language", "operator": "==", "value": "fr"}
})

Compound Conditions (Date Range Routing)

from haystack.components.routers import MetadataRouter

router = MetadataRouter(rules={
    "q1_2023": {
        "operator": "AND",
        "conditions": [
            {"field": "meta.created_at", "operator": ">=", "value": "2023-01-01"},
            {"field": "meta.created_at", "operator": "<", "value": "2023-04-01"}
        ]
    },
    "q2_2023": {
        "operator": "AND",
        "conditions": [
            {"field": "meta.created_at", "operator": ">=", "value": "2023-04-01"},
            {"field": "meta.created_at", "operator": "<", "value": "2023-07-01"}
        ]
    }
})

Routing ByteStreams

from haystack.dataclasses import ByteStream
from haystack.components.routers import MetadataRouter

streams = [
    ByteStream.from_string("Hello world", meta={"language": "en"}),
    ByteStream.from_string("Bonjour le monde", meta={"language": "fr"})
]

router = MetadataRouter(
    rules={"english": {"field": "meta.language", "operator": "==", "value": "en"}},
    output_type=list[ByteStream]
)

result = router.run(documents=streams)
# result["english"]: [ByteStream with "Hello world"]
# result["unmatched"]: [ByteStream with "Bonjour le monde"]

Pipeline with Language Classification and Routing

from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.classifiers import DocumentLanguageClassifier
from haystack.components.routers import MetadataRouter
from haystack.components.writers import DocumentWriter

docs = [
    Document(content="This is an English document"),
    Document(content="Dies ist ein deutsches Dokument")
]

en_store = InMemoryDocumentStore()
de_store = InMemoryDocumentStore()

pipeline = Pipeline()
pipeline.add_component("classifier", DocumentLanguageClassifier(languages=["en", "de"]))
pipeline.add_component("router", MetadataRouter(rules={
    "en": {"field": "meta.language", "operator": "==", "value": "en"},
    "de": {"field": "meta.language", "operator": "==", "value": "de"}
}))
pipeline.add_component("en_writer", DocumentWriter(document_store=en_store))
pipeline.add_component("de_writer", DocumentWriter(document_store=de_store))

pipeline.connect("classifier.documents", "router.documents")
pipeline.connect("router.en", "en_writer.documents")
pipeline.connect("router.de", "de_writer.documents")

pipeline.run({"classifier": {"documents": docs}})

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment