Implementation:Apache Dolphinscheduler FailoverCoordinator Failover
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Systems, Fault_Tolerance |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete tool for initiating failover recovery using FailoverCoordinator which orchestrates workflow and task failover upon master or worker node failure.
Description
FailoverCoordinator is a Spring @Component that handles both master and worker failover. failoverMaster(MasterFailoverEvent) finds all active workflow instances assigned to the failed master, calls WorkflowFailover.failoverWorkflow() for each which inserts a recovery command. failoverWorker(WorkerFailoverEvent) finds tasks on the failed worker and delegates to TaskFailover. WorkflowFailover creates a Command entity with WorkflowFailoverCommandParam carrying the last known WorkflowExecutionStatus.
Usage
Triggered automatically by cluster change listener notifications. Registered as a listener during master initialization.
Code Reference
Source Location
- Repository: dolphinscheduler
- File: dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/failover/FailoverCoordinator.java (L54-261)
- File: dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/failover/WorkflowFailover.java (L37-46)
- File: dolphinscheduler-extract/dolphinscheduler-extract-master/src/main/java/org/apache/dolphinscheduler/extract/master/command/WorkflowFailoverCommandParam.java (L32-40)
Signature
@Component
public class FailoverCoordinator {
public void globalMasterFailover(GlobalMasterFailoverEvent event);
public void failoverMaster(MasterFailoverEvent event);
public void failoverWorker(WorkerFailoverEvent event);
public void cleanHistoryFailoverFinishedMarks();
private void doMasterFailover(String masterAddress, long failoverDeadline, String path);
private void doWorkerFailover(String workerAddress, long failoverDeadline, String path);
}
@Component
public class WorkflowFailover {
public void failoverWorkflow(WorkflowInstance workflowInstance);
}
@Data
public class WorkflowFailoverCommandParam implements ICommandParam {
private WorkflowExecutionStatus workflowExecutionStatus;
@Override
public CommandType getCommandType();
}
Import
import org.apache.dolphinscheduler.server.master.failover.FailoverCoordinator;
import org.apache.dolphinscheduler.server.master.failover.WorkflowFailover;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| MasterFailoverEvent | Event | Yes (for master failover) | Contains failed master address and crash time |
| WorkerFailoverEvent | Event | Yes (for worker failover) | Contains failed worker address and crash time |
Outputs
| Name | Type | Description |
|---|---|---|
| Command records | DB records | RECOVER_TOLERANCE_FAULT_PROCESS commands in t_ds_command |
| Failover markers | Registry entries | Prevent duplicate failover processing |
Usage Examples
Master Failover
// Triggered by MasterClusters.onServerRemove() listener
failoverCoordinator.failoverMaster(
MasterFailoverEvent.builder()
.masterAddress("master-2:5678")
.crashTime(System.currentTimeMillis())
.build()
);
// Creates recovery commands for all workflows on crashed master-2
Worker Failover
// Triggered by WorkerClusters.onServerRemove() listener
failoverCoordinator.failoverWorker(
WorkerFailoverEvent.builder()
.workerAddress("worker-3:1234")
.crashTime(System.currentTimeMillis())
.build()
);
// Re-dispatches tasks from crashed worker-3 to healthy workers