Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Apache Dolphinscheduler FailoverCoordinator WorkerFailover

From Leeroopedia


Knowledge Sources
Domains Distributed_Systems, Fault_Tolerance
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for reassigning tasks from a failed worker to healthy workers using FailoverCoordinator.doWorkerFailover and TaskFailover with WorkerGroupDispatcherCoordinator re-dispatch.

Description

FailoverCoordinator.doWorkerFailover() retrieves the list of ITaskExecutionRunnable tasks assigned to the crashed worker via getFailoverTaskForWorker(), then delegates each to TaskFailover. Failed tasks are re-queued to WorkerGroupDispatcherCoordinator.dispatchTask() which selects a healthy worker from the same worker group. The failover process uses registry-based markers to prevent duplicate processing.

Usage

Called internally by failoverWorker(WorkerFailoverEvent) in the FailoverCoordinator. Not invoked directly.

Code Reference

Source Location

  • Repository: dolphinscheduler
  • File: dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/failover/FailoverCoordinator.java (L241-261)
  • File: dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/engine/task/dispatcher/WorkerGroupDispatcherCoordinator.java (L59-71)

Signature

// In FailoverCoordinator
private void doWorkerFailover(String workerAddress,
                              long taskFailoverDeadline,
                              String workerFailoverNodePath);

// Re-dispatch through coordinator
workerGroupDispatcherCoordinator.dispatchTask(
    taskExecutionRunnable,
    0  // immediate dispatch
);

Import

import org.apache.dolphinscheduler.server.master.failover.FailoverCoordinator;

I/O Contract

Inputs

Name Type Required Description
workerAddress String Yes Address of the failed worker node
taskFailoverDeadline long Yes Timestamp cutoff for identifying affected tasks

Outputs

Name Type Description
Re-dispatched tasks RPC calls Tasks sent to healthy workers via dispatcher
Registry cleanup Registry entries Failover markers created/cleaned

Usage Examples

Worker Failover Processing

// Inside FailoverCoordinator.failoverWorker()
doWorkerFailover(
    "worker-3:1234",                    // failed worker address
    System.currentTimeMillis(),          // deadline
    "/failover/worker/worker-3:1234"    // registry marker path
);
// All tasks on worker-3 are re-dispatched to other healthy workers
// in their respective worker groups

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment