Workflow:Datahub project Datahub Metadata Actions Pipeline
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Metadata_Management, Event_Driven |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
End-to-end process for building event-driven automation pipelines that react to real-time metadata changes in DataHub.
Description
This workflow covers the DataHub Actions Framework, which provides a pluggable pipeline for consuming real-time metadata change events from Kafka and executing automated reactions. The pipeline has six configurable components: name, source, filter, transformer, action, and error handling options. Actions can respond to entity changes (tag additions, ownership changes, deprecations) by executing custom logic, calling external APIs, or triggering downstream workflows.
Usage
Execute this workflow when you need to automate responses to metadata changes in DataHub. Examples include sending notifications when datasets are deprecated, triggering data quality checks when schema changes occur, synchronizing metadata to external systems, or enforcing governance policies in real-time.
Execution Steps
Step 1: Install Actions Framework
Install the DataHub CLI and the Actions Framework extension package. The framework extends the base CLI with event consumption and action execution capabilities.
Key considerations:
- Requires acryl-datahub version 0.8.34 or higher
- Install acryl-datahub-actions as an additional package
- Verify installation with datahub actions version
Step 2: Create Action Configuration
Author a YAML configuration file defining the six pipeline components: pipeline name, event source (Kafka), optional filters, optional transformers, action handler, and error handling options.
Key considerations:
- Source configures the Kafka consumer connection
- Filters narrow events by type (EntityChangeEvent_v1 or MetadataChangeLog_v1)
- Transformers modify events before they reach the action
- The action component defines what happens when events match
Step 3: Configure Event Filtering
Define filters to select specific event types and entity changes. Filters reduce noise by ensuring only relevant events trigger the action.
Key considerations:
- Filter by event type (entity change vs. raw metadata change log)
- Filter by entity type, aspect name, or change category
- Multiple filters can be combined for precise event selection
Step 4: Implement Action Logic
Configure the action component that executes when matching events are received. Use built-in actions or implement custom action plugins.
Key considerations:
- Built-in actions include hello_world (logging) and executor (command execution)
- Custom actions implement the Action interface
- Actions receive the full event payload for processing
- Error handling options control retry behavior and failure modes
Step 5: Deploy and Run
Launch the action pipeline as a long-running process. Multiple pipelines can run simultaneously for different event types.
Key considerations:
- Run with datahub actions -c config.yml
- Multiple configs can be specified for concurrent pipelines
- Use debug mode for troubleshooting event processing
- Configure failed_events_dir for dead-letter storage
- Retry count and failure mode (continue vs. stop) are configurable