Principle:Datahub project Datahub Action Pipeline Configuration

Metadata

Field	Value
Principle ID	P-DHACT-002
Title	Action Pipeline Configuration
Category	Event-Driven Automation
Status	Active
Last Updated	2026-02-10
Repository	Datahub_project_Datahub
Knowledge Sources	GitHub - datahub-project/datahub, DataHub Documentation
Domains	Event_Processing, Automation, Metadata_Management

Overview

A declarative YAML-based configuration pattern for defining event-driven automation pipelines with source, filter, transform, and action stages. Each pipeline configuration file defines the complete topology of an event processing pipeline that consumes metadata change events and performs automated actions.

Description

Action pipeline configuration defines the topology of an event processing pipeline. Each pipeline is described by a YAML document with the following structure:

name (required): A unique pipeline identifier, also used as the Kafka consumer group ID. This means multiple instances of the same named pipeline will share event consumption, while differently named pipelines independently consume all events.
enabled (optional, default: true): Whether the pipeline should be started.
source (required): The event source configuration, typically Kafka, specifying connection parameters and topic routes.
filter (optional): An event filter that selectively passes events based on type and content matching. Converted internally to a FilterTransformer.
transform (optional): A list of transformer configurations that modify events before they reach the action.
action (required): The action plugin configuration specifying which action to invoke and its parameters.
datahub (optional): DataHub connection configuration for actions that need to read or write metadata.
options (optional): Execution options including retry count, failure mode (THROW or CONTINUE), and failed events directory.

Events flow through the pipeline stages in order: source -> filter -> transform(s) -> action. Each stage is independently configurable, and the filter and transform stages are optional.

Usage

Use this principle when defining a new automation that should react to metadata changes in DataHub. A pipeline configuration file is the single artifact needed to define, test, and deploy a metadata automation workflow. Configuration files support environment variable expansion, enabling deployment-specific customization without modifying the pipeline definition.

Common configuration patterns include:

Notification pipelines: Source (Kafka) -> Filter (EntityChangeEvent_v1) -> Action (slack/teams)
Propagation pipelines: Source (Kafka) -> Filter (MetadataChangeLogEvent_v1) -> Action (tag_propagation/term_propagation/doc_propagation)
Custom pipelines: Source (Kafka) -> Filter -> Transform -> Action (custom plugin)

Theoretical Basis

Pipeline pattern with configurable stages: Events flow through a linear sequence of stages (source, filter, transforms, action), with each stage independently configurable via YAML. This is a specialization of the pipes-and-filters architectural pattern where:

The source acts as the data pump, producing events from Kafka topics
The filter acts as a content-based router, discarding irrelevant events early
The transforms act as message translators, modifying events before processing
The action acts as the message endpoint, performing the final side effect

The pipeline name doubles as the Kafka consumer group ID, which means pipeline configuration inherently defines both the processing logic and the consumption topology. This coupling is intentional -- it ensures that each logical automation has exactly one consumer group, preventing duplicate processing.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment