Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Deepset ai Haystack Metadata Based Routing

From Leeroopedia

Template:Metadata

Overview

Metadata-Based Routing is the principle of directing documents (or byte streams) to different processing branches within a pipeline based on the values of their metadata fields. Unlike file type routing, which classifies by content format, metadata-based routing operates on the semantic attributes attached to each document, enabling conditional processing logic based on properties like language, date, source, category, or any user-defined metadata field.

Description

In complex document processing pipelines, different documents often require different treatment based on their attributes rather than their format. For example:

  • Documents in different languages may need different embedding models.
  • Documents from different time periods may need different processing or storage destinations.
  • Documents from different sources may need different cleaning or enrichment steps.

Metadata-Based Routing uses filter rules to evaluate each document's metadata and assign it to the appropriate output connection. The mechanism works as follows:

  • Rule definition: Each output connection is associated with a filter rule expressed as a dictionary with field, operator, and value keys. Rules follow the Haystack metadata filtering syntax.
  • Rule evaluation: For each document, every rule is evaluated against the document's metadata fields. A document can match multiple rules and be sent to multiple outputs.
  • Unmatched handling: Documents that do not match any rule are routed to a dedicated unmatched output, ensuring no documents are silently dropped.

Supported Filter Operations

The filter syntax supports a rich set of operators:

  • Comparison operators: ==, !=, >, >=, <, <=
  • Collection operators: in, not in
  • Logical operators: AND, OR, NOT (for combining conditions)

Complex rules can combine multiple conditions using logical operators:

{
    "operator": "AND",
    "conditions": [
        {"field": "meta.created_at", "operator": ">=", "value": "2023-01-01"},
        {"field": "meta.created_at", "operator": "<", "value": "2023-04-01"}
    ]
}

Key Properties

  • Flexible output type: Supports routing both Document and ByteStream objects.
  • Multi-match support: A single document can match multiple rules and appear in multiple outputs.
  • Comprehensive unmatched handling: Documents matching no rules are collected in the unmatched output.
  • Declarative configuration: Rules are specified as data (dictionaries) rather than code, making them serializable and configurable.

Usage

Metadata-Based Routing is used at decision points in a pipeline where processing must diverge based on document attributes. Common use cases include:

  • Language routing: After language classification, routing documents to language-specific processors.
  • Temporal routing: Routing documents to different storage backends based on creation date.
  • Source-based routing: Processing documents differently based on their origin.

A typical language-routing pipeline:

[DocumentLanguageClassifier] --> [MetadataRouter] --en--> [EnglishEmbedder]
                                                  --de--> [GermanEmbedder]
                                                  --unmatched--> [FallbackHandler]

Theoretical Basis

Metadata-Based Routing implements the Content-Based Router pattern from enterprise integration patterns (EIP). In this pattern, the message (document) is inspected at a routing node, and its content (metadata fields) determines which channel it is forwarded to. The pattern supports conditional branching, enabling different processing paths for different message types without requiring the sender to know about the routing logic.

The filter syntax is based on a subset of conjunctive/disjunctive normal form for boolean expressions, where conditions on individual fields are combined using logical AND/OR operators. This provides sufficient expressiveness for most routing scenarios while keeping the rule format simple and serializable.

The distinction between file-type routing (based on format) and metadata routing (based on semantic attributes) reflects the separation between structural classification and semantic classification in information management systems.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment