Principle:Datahub project Datahub Actions Deployment
Metadata
| Field | Value |
|---|---|
| Principle ID | P-DHACT-005 |
| Title | Actions Deployment |
| Category | Event-Driven Automation |
| Status | Active |
| Last Updated | 2026-02-10 |
| Repository | Datahub_project_Datahub |
| Knowledge Sources | GitHub - datahub-project/datahub, DataHub Documentation |
| Domains | Event_Processing, Automation, Metadata_Management |
Overview
The process of deploying and running event-driven action pipelines as long-running consumers of DataHub metadata events. Actions deployment launches one or more pipeline configurations as daemon threads consuming from Kafka, with lifecycle management, retry logic, and offset tracking.
Description
Actions deployment involves launching the datahub-actions CLI with one or more YAML configuration files. The deployment process follows these stages:
Pipeline Lifecycle
- Configuration Loading: Each YAML config file is loaded with environment variable expansion via
load_config_file(). Disabled pipelines (enabled: false) are skipped. Invalid configs are logged and skipped if multiple configs are provided, or cause an error if only one config is specified. - Pipeline Creation:
Pipeline.create(config_dict)validates the configuration, instantiates the event source (Kafka consumer), creates the filter and transform chain, and creates the action plugin. - Thread Management: Each pipeline is started in its own daemon thread by the
PipelineManager. The manager maintains a registry ofPipelineSpecobjects (name, pipeline, thread) for lifecycle management. - Event Loop: Each pipeline runs a blocking event loop that consumes events from Kafka, applies transforms, invokes the action, and acknowledges processed events back to Kafka (offset commit).
- Shutdown: On SIGINT (Ctrl-C), the signal handler calls
PipelineManager.stop_all(), which stops each pipeline (closing sources and actions) and joins each thread.
Kafka Topic Consumption
The default Kafka event source subscribes to three topics:
MetadataChangeLog_Versioned_v1: Versioned metadata change log eventsMetadataChangeLog_Timeseries_v1: Timeseries metadata change log eventsPlatformEvent_v1: Platform-level events (includingEntityChangeEvent)
Execution Guarantees
- At-least-once delivery: Events are committed to Kafka after processing. If the action fails, the event may be redelivered on restart.
- Configurable retries: The
retry_countoption controls how many times a single event is retried before being sent to the dead letter queue (failed events log file). - Failure modes:
THROWstops the pipeline on unrecoverable failure.CONTINUElogs the failure and moves to the next event. - Failed events logging: Failed events are always written to a log file (default:
/tmp/logs/datahub/actions/<pipeline_name>/failed_events.log), regardless of failure mode.
Usage
Use this principle when deploying metadata automation in production or development environments. Common deployment patterns include:
- Single pipeline:
datahub-actions -c pipeline.ymlfor focused automation - Multiple pipelines:
datahub-actions -c notify.yml -c propagate.ymlfor running several automations in one process - Container deployment: Run
datahub-actionsas a long-lived container process alongside the DataHub stack - Monitoring: Enable Prometheus metrics with
--enable-monitoringfor production observability
Theoretical Basis
Consumer group pattern: Each pipeline uses its name as a Kafka consumer group ID. This has two important consequences:
- Independent consumption: Multiple differently-named pipelines independently consume all events from the same Kafka topics. Each pipeline gets its own view of the event stream.
- Shared consumption: Multiple instances of the same-named pipeline share consumption (partitioned). This enables horizontal scaling of a single automation across multiple processes.
Thread-per-pipeline model: The PipelineManager runs each pipeline in its own thread, isolating pipeline failures. A failing pipeline does not affect other running pipelines. The main thread sleeps in an infinite loop, serving only as a signal handler anchor for graceful shutdown.
Related
- Implemented by: Datahub_project_Datahub_Actions_CLI_Run