Implementation:Apache Dolphinscheduler FailoverCoordinator GlobalFailover
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Systems, Fault_Tolerance |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete tool for reconciling cluster state on master startup using FailoverCoordinator.globalMasterFailover, cleanHistoryFailoverFinishedMarks, and IAlertOperator notifications.
Description
FailoverCoordinator.globalMasterFailover(GlobalMasterFailoverEvent) scans all workflow instances with non-finished status and stale master host assignments, initiating failover for each orphaned workflow. cleanHistoryFailoverFinishedMarks() removes completed failover markers from the registry to prevent accumulation. IAlertOperator.sendAlert(AlertSendRequest) sends notifications to administrators about failover events via configured alert channels (email, DingTalk, etc.).
Usage
Called during master server startup and optionally as a periodic reconciliation check.
Code Reference
Source Location
- Repository: dolphinscheduler
- File: dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/failover/FailoverCoordinator.java (L75-107) (global failover), (L216-241) (clean marks)
- File: dolphinscheduler-extract/dolphinscheduler-extract-alert/src/main/java/org/apache/dolphinscheduler/extract/alert/IAlertOperator.java (L26-32)
Signature
@Component
public class FailoverCoordinator {
public void globalMasterFailover(GlobalMasterFailoverEvent event);
public void cleanHistoryFailoverFinishedMarks();
}
@RpcService
public interface IAlertOperator {
@RpcMethod
AlertSendResponse sendAlert(AlertSendRequest alertSendRequest);
@RpcMethod
AlertSendResponse sendTestAlert(AlertSendRequest alertSendRequest);
}
Import
import org.apache.dolphinscheduler.server.master.failover.FailoverCoordinator;
import org.apache.dolphinscheduler.extract.alert.IAlertOperator;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| GlobalMasterFailoverEvent | Event | Yes | Startup event triggering global scan |
| AlertSendRequest | DTO | For alerts | Alert content and target group |
Outputs
| Name | Type | Description |
|---|---|---|
| Recovered workflows | Commands | RECOVER_TOLERANCE_FAULT_PROCESS commands for orphaned workflows |
| Cleaned markers | Registry | Stale failover markers removed |
| Alert notifications | RPC calls | Failover summary sent to alert channels |
Usage Examples
Global Failover on Master Startup
// In master server bootstrap:
failoverCoordinator.globalMasterFailover(
GlobalMasterFailoverEvent.builder().build()
);
failoverCoordinator.cleanHistoryFailoverFinishedMarks();
// All orphaned workflows are now recovered
// Stale registry markers are cleaned
Sending Failover Alert
IAlertOperator alertOperator = Clients
.withService(IAlertOperator.class)
.withHost(alertServerAddress);
alertOperator.sendAlert(
AlertSendRequest.builder()
.title("Failover Alert")
.content("Master master-2:5678 failed. 5 workflows recovered.")
.alertGroupId(1)
.build()
);