Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Datahub project Datahub Action Pipeline Configuration

From Leeroopedia


Metadata

Field Value
Principle ID P-DHACT-002
Title Action Pipeline Configuration
Category Event-Driven Automation
Status Active
Last Updated 2026-02-10
Repository Datahub_project_Datahub
Knowledge Sources GitHub - datahub-project/datahub, DataHub Documentation
Domains Event_Processing, Automation, Metadata_Management

Overview

A declarative YAML-based configuration pattern for defining event-driven automation pipelines with source, filter, transform, and action stages. Each pipeline configuration file defines the complete topology of an event processing pipeline that consumes metadata change events and performs automated actions.

Description

Action pipeline configuration defines the topology of an event processing pipeline. Each pipeline is described by a YAML document with the following structure:

  • name (required): A unique pipeline identifier, also used as the Kafka consumer group ID. This means multiple instances of the same named pipeline will share event consumption, while differently named pipelines independently consume all events.
  • enabled (optional, default: true): Whether the pipeline should be started.
  • source (required): The event source configuration, typically Kafka, specifying connection parameters and topic routes.
  • filter (optional): An event filter that selectively passes events based on type and content matching. Converted internally to a FilterTransformer.
  • transform (optional): A list of transformer configurations that modify events before they reach the action.
  • action (required): The action plugin configuration specifying which action to invoke and its parameters.
  • datahub (optional): DataHub connection configuration for actions that need to read or write metadata.
  • options (optional): Execution options including retry count, failure mode (THROW or CONTINUE), and failed events directory.

Events flow through the pipeline stages in order: source -> filter -> transform(s) -> action. Each stage is independently configurable, and the filter and transform stages are optional.

Usage

Use this principle when defining a new automation that should react to metadata changes in DataHub. A pipeline configuration file is the single artifact needed to define, test, and deploy a metadata automation workflow. Configuration files support environment variable expansion, enabling deployment-specific customization without modifying the pipeline definition.

Common configuration patterns include:

  • Notification pipelines: Source (Kafka) -> Filter (EntityChangeEvent_v1) -> Action (slack/teams)
  • Propagation pipelines: Source (Kafka) -> Filter (MetadataChangeLogEvent_v1) -> Action (tag_propagation/term_propagation/doc_propagation)
  • Custom pipelines: Source (Kafka) -> Filter -> Transform -> Action (custom plugin)

Theoretical Basis

Pipeline pattern with configurable stages: Events flow through a linear sequence of stages (source, filter, transforms, action), with each stage independently configurable via YAML. This is a specialization of the pipes-and-filters architectural pattern where:

  • The source acts as the data pump, producing events from Kafka topics
  • The filter acts as a content-based router, discarding irrelevant events early
  • The transforms act as message translators, modifying events before processing
  • The action acts as the message endpoint, performing the final side effect

The pipeline name doubles as the Kafka consumer group ID, which means pipeline configuration inherently defines both the processing logic and the consumption topology. This coupling is intentional -- it ensures that each logical automation has exactly one consumer group, preventing duplicate processing.

Related

Implementation:Datahub_project_Datahub_Actions_PipelineConfig

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment