Principle:Datahub project Datahub Action Pipeline Configuration
Metadata
| Field | Value |
|---|---|
| Principle ID | P-DHACT-002 |
| Title | Action Pipeline Configuration |
| Category | Event-Driven Automation |
| Status | Active |
| Last Updated | 2026-02-10 |
| Repository | Datahub_project_Datahub |
| Knowledge Sources | GitHub - datahub-project/datahub, DataHub Documentation |
| Domains | Event_Processing, Automation, Metadata_Management |
Overview
A declarative YAML-based configuration pattern for defining event-driven automation pipelines with source, filter, transform, and action stages. Each pipeline configuration file defines the complete topology of an event processing pipeline that consumes metadata change events and performs automated actions.
Description
Action pipeline configuration defines the topology of an event processing pipeline. Each pipeline is described by a YAML document with the following structure:
- name (required): A unique pipeline identifier, also used as the Kafka consumer group ID. This means multiple instances of the same named pipeline will share event consumption, while differently named pipelines independently consume all events.
- enabled (optional, default:
true): Whether the pipeline should be started. - source (required): The event source configuration, typically Kafka, specifying connection parameters and topic routes.
- filter (optional): An event filter that selectively passes events based on type and content matching. Converted internally to a
FilterTransformer. - transform (optional): A list of transformer configurations that modify events before they reach the action.
- action (required): The action plugin configuration specifying which action to invoke and its parameters.
- datahub (optional): DataHub connection configuration for actions that need to read or write metadata.
- options (optional): Execution options including retry count, failure mode (THROW or CONTINUE), and failed events directory.
Events flow through the pipeline stages in order: source -> filter -> transform(s) -> action. Each stage is independently configurable, and the filter and transform stages are optional.
Usage
Use this principle when defining a new automation that should react to metadata changes in DataHub. A pipeline configuration file is the single artifact needed to define, test, and deploy a metadata automation workflow. Configuration files support environment variable expansion, enabling deployment-specific customization without modifying the pipeline definition.
Common configuration patterns include:
- Notification pipelines: Source (Kafka) -> Filter (EntityChangeEvent_v1) -> Action (slack/teams)
- Propagation pipelines: Source (Kafka) -> Filter (MetadataChangeLogEvent_v1) -> Action (tag_propagation/term_propagation/doc_propagation)
- Custom pipelines: Source (Kafka) -> Filter -> Transform -> Action (custom plugin)
Theoretical Basis
Pipeline pattern with configurable stages: Events flow through a linear sequence of stages (source, filter, transforms, action), with each stage independently configurable via YAML. This is a specialization of the pipes-and-filters architectural pattern where:
- The source acts as the data pump, producing events from Kafka topics
- The filter acts as a content-based router, discarding irrelevant events early
- The transforms act as message translators, modifying events before processing
- The action acts as the message endpoint, performing the final side effect
The pipeline name doubles as the Kafka consumer group ID, which means pipeline configuration inherently defines both the processing logic and the consumption topology. This coupling is intentional -- it ensures that each logical automation has exactly one consumer group, preventing duplicate processing.
Related
- Implemented by: Datahub_project_Datahub_Actions_PipelineConfig
Implementation:Datahub_project_Datahub_Actions_PipelineConfig
- Related principles: Datahub_project_Datahub_Event_Filtering, Datahub_project_Datahub_Actions_Deployment