Implementation:Apache Dolphinscheduler FailoverCoordinator WorkerFailover
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Systems, Fault_Tolerance |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete tool for reassigning tasks from a failed worker to healthy workers using FailoverCoordinator.doWorkerFailover and TaskFailover with WorkerGroupDispatcherCoordinator re-dispatch.
Description
FailoverCoordinator.doWorkerFailover() retrieves the list of ITaskExecutionRunnable tasks assigned to the crashed worker via getFailoverTaskForWorker(), then delegates each to TaskFailover. Failed tasks are re-queued to WorkerGroupDispatcherCoordinator.dispatchTask() which selects a healthy worker from the same worker group. The failover process uses registry-based markers to prevent duplicate processing.
Usage
Called internally by failoverWorker(WorkerFailoverEvent) in the FailoverCoordinator. Not invoked directly.
Code Reference
Source Location
- Repository: dolphinscheduler
- File: dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/failover/FailoverCoordinator.java (L241-261)
- File: dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/engine/task/dispatcher/WorkerGroupDispatcherCoordinator.java (L59-71)
Signature
// In FailoverCoordinator
private void doWorkerFailover(String workerAddress,
long taskFailoverDeadline,
String workerFailoverNodePath);
// Re-dispatch through coordinator
workerGroupDispatcherCoordinator.dispatchTask(
taskExecutionRunnable,
0 // immediate dispatch
);
Import
import org.apache.dolphinscheduler.server.master.failover.FailoverCoordinator;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| workerAddress | String | Yes | Address of the failed worker node |
| taskFailoverDeadline | long | Yes | Timestamp cutoff for identifying affected tasks |
Outputs
| Name | Type | Description |
|---|---|---|
| Re-dispatched tasks | RPC calls | Tasks sent to healthy workers via dispatcher |
| Registry cleanup | Registry entries | Failover markers created/cleaned |
Usage Examples
Worker Failover Processing
// Inside FailoverCoordinator.failoverWorker()
doWorkerFailover(
"worker-3:1234", // failed worker address
System.currentTimeMillis(), // deadline
"/failover/worker/worker-3:1234" // registry marker path
);
// All tasks on worker-3 are re-dispatched to other healthy workers
// in their respective worker groups