Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Dolphinscheduler FailoverCoordinator GlobalFailover

From Leeroopedia


Knowledge Sources
Domains Distributed_Systems, Fault_Tolerance
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for reconciling cluster state on master startup using FailoverCoordinator.globalMasterFailover, cleanHistoryFailoverFinishedMarks, and IAlertOperator notifications.

Description

FailoverCoordinator.globalMasterFailover(GlobalMasterFailoverEvent) scans all workflow instances with non-finished status and stale master host assignments, initiating failover for each orphaned workflow. cleanHistoryFailoverFinishedMarks() removes completed failover markers from the registry to prevent accumulation. IAlertOperator.sendAlert(AlertSendRequest) sends notifications to administrators about failover events via configured alert channels (email, DingTalk, etc.).

Usage

Called during master server startup and optionally as a periodic reconciliation check.

Code Reference

Source Location

  • Repository: dolphinscheduler
  • File: dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/failover/FailoverCoordinator.java (L75-107) (global failover), (L216-241) (clean marks)
  • File: dolphinscheduler-extract/dolphinscheduler-extract-alert/src/main/java/org/apache/dolphinscheduler/extract/alert/IAlertOperator.java (L26-32)

Signature

@Component
public class FailoverCoordinator {
    public void globalMasterFailover(GlobalMasterFailoverEvent event);
    public void cleanHistoryFailoverFinishedMarks();
}

@RpcService
public interface IAlertOperator {
    @RpcMethod
    AlertSendResponse sendAlert(AlertSendRequest alertSendRequest);

    @RpcMethod
    AlertSendResponse sendTestAlert(AlertSendRequest alertSendRequest);
}

Import

import org.apache.dolphinscheduler.server.master.failover.FailoverCoordinator;
import org.apache.dolphinscheduler.extract.alert.IAlertOperator;

I/O Contract

Inputs

Name Type Required Description
GlobalMasterFailoverEvent Event Yes Startup event triggering global scan
AlertSendRequest DTO For alerts Alert content and target group

Outputs

Name Type Description
Recovered workflows Commands RECOVER_TOLERANCE_FAULT_PROCESS commands for orphaned workflows
Cleaned markers Registry Stale failover markers removed
Alert notifications RPC calls Failover summary sent to alert channels

Usage Examples

Global Failover on Master Startup

// In master server bootstrap:
failoverCoordinator.globalMasterFailover(
    GlobalMasterFailoverEvent.builder().build()
);
failoverCoordinator.cleanHistoryFailoverFinishedMarks();
// All orphaned workflows are now recovered
// Stale registry markers are cleaned

Sending Failover Alert

IAlertOperator alertOperator = Clients
    .withService(IAlertOperator.class)
    .withHost(alertServerAddress);

alertOperator.sendAlert(
    AlertSendRequest.builder()
        .title("Failover Alert")
        .content("Master master-2:5678 failed. 5 workflows recovered.")
        .alertGroupId(1)
        .build()
);

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment