Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Datahub project Datahub Metadata Actions Pipeline

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Metadata_Management, Event_Driven
Last Updated 2026-02-09 12:00 GMT

Overview

End-to-end process for building event-driven automation pipelines that react to real-time metadata changes in DataHub.

Description

This workflow covers the DataHub Actions Framework, which provides a pluggable pipeline for consuming real-time metadata change events from Kafka and executing automated reactions. The pipeline has six configurable components: name, source, filter, transformer, action, and error handling options. Actions can respond to entity changes (tag additions, ownership changes, deprecations) by executing custom logic, calling external APIs, or triggering downstream workflows.

Usage

Execute this workflow when you need to automate responses to metadata changes in DataHub. Examples include sending notifications when datasets are deprecated, triggering data quality checks when schema changes occur, synchronizing metadata to external systems, or enforcing governance policies in real-time.

Execution Steps

Step 1: Install Actions Framework

Install the DataHub CLI and the Actions Framework extension package. The framework extends the base CLI with event consumption and action execution capabilities.

Key considerations:

  • Requires acryl-datahub version 0.8.34 or higher
  • Install acryl-datahub-actions as an additional package
  • Verify installation with datahub actions version

Step 2: Create Action Configuration

Author a YAML configuration file defining the six pipeline components: pipeline name, event source (Kafka), optional filters, optional transformers, action handler, and error handling options.

Key considerations:

  • Source configures the Kafka consumer connection
  • Filters narrow events by type (EntityChangeEvent_v1 or MetadataChangeLog_v1)
  • Transformers modify events before they reach the action
  • The action component defines what happens when events match

Step 3: Configure Event Filtering

Define filters to select specific event types and entity changes. Filters reduce noise by ensuring only relevant events trigger the action.

Key considerations:

  • Filter by event type (entity change vs. raw metadata change log)
  • Filter by entity type, aspect name, or change category
  • Multiple filters can be combined for precise event selection

Step 4: Implement Action Logic

Configure the action component that executes when matching events are received. Use built-in actions or implement custom action plugins.

Key considerations:

  • Built-in actions include hello_world (logging) and executor (command execution)
  • Custom actions implement the Action interface
  • Actions receive the full event payload for processing
  • Error handling options control retry behavior and failure modes

Step 5: Deploy and Run

Launch the action pipeline as a long-running process. Multiple pipelines can run simultaneously for different event types.

Key considerations:

  • Run with datahub actions -c config.yml
  • Multiple configs can be specified for concurrent pipelines
  • Use debug mode for troubleshooting event processing
  • Configure failed_events_dir for dead-letter storage
  • Retry count and failure mode (continue vs. stop) are configurable

Execution Diagram

GitHub URL

Workflow Repository