Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Dolphinscheduler FailoverCoordinator Failover

From Leeroopedia


Knowledge Sources
Domains Distributed_Systems, Fault_Tolerance
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for initiating failover recovery using FailoverCoordinator which orchestrates workflow and task failover upon master or worker node failure.

Description

FailoverCoordinator is a Spring @Component that handles both master and worker failover. failoverMaster(MasterFailoverEvent) finds all active workflow instances assigned to the failed master, calls WorkflowFailover.failoverWorkflow() for each which inserts a recovery command. failoverWorker(WorkerFailoverEvent) finds tasks on the failed worker and delegates to TaskFailover. WorkflowFailover creates a Command entity with WorkflowFailoverCommandParam carrying the last known WorkflowExecutionStatus.

Usage

Triggered automatically by cluster change listener notifications. Registered as a listener during master initialization.

Code Reference

Source Location

  • Repository: dolphinscheduler
  • File: dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/failover/FailoverCoordinator.java (L54-261)
  • File: dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/failover/WorkflowFailover.java (L37-46)
  • File: dolphinscheduler-extract/dolphinscheduler-extract-master/src/main/java/org/apache/dolphinscheduler/extract/master/command/WorkflowFailoverCommandParam.java (L32-40)

Signature

@Component
public class FailoverCoordinator {
    public void globalMasterFailover(GlobalMasterFailoverEvent event);
    public void failoverMaster(MasterFailoverEvent event);
    public void failoverWorker(WorkerFailoverEvent event);
    public void cleanHistoryFailoverFinishedMarks();
    private void doMasterFailover(String masterAddress, long failoverDeadline, String path);
    private void doWorkerFailover(String workerAddress, long failoverDeadline, String path);
}

@Component
public class WorkflowFailover {
    public void failoverWorkflow(WorkflowInstance workflowInstance);
}

@Data
public class WorkflowFailoverCommandParam implements ICommandParam {
    private WorkflowExecutionStatus workflowExecutionStatus;
    @Override
    public CommandType getCommandType();
}

Import

import org.apache.dolphinscheduler.server.master.failover.FailoverCoordinator;
import org.apache.dolphinscheduler.server.master.failover.WorkflowFailover;

I/O Contract

Inputs

Name Type Required Description
MasterFailoverEvent Event Yes (for master failover) Contains failed master address and crash time
WorkerFailoverEvent Event Yes (for worker failover) Contains failed worker address and crash time

Outputs

Name Type Description
Command records DB records RECOVER_TOLERANCE_FAULT_PROCESS commands in t_ds_command
Failover markers Registry entries Prevent duplicate failover processing

Usage Examples

Master Failover

// Triggered by MasterClusters.onServerRemove() listener
failoverCoordinator.failoverMaster(
    MasterFailoverEvent.builder()
        .masterAddress("master-2:5678")
        .crashTime(System.currentTimeMillis())
        .build()
);
// Creates recovery commands for all workflows on crashed master-2

Worker Failover

// Triggered by WorkerClusters.onServerRemove() listener
failoverCoordinator.failoverWorker(
    WorkerFailoverEvent.builder()
        .workerAddress("worker-3:1234")
        .crashTime(System.currentTimeMillis())
        .build()
);
// Re-dispatches tasks from crashed worker-3 to healthy workers

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment