Principle:Apache Dolphinscheduler Failover Process Initiation

Knowledge Sources	Apache DolphinScheduler Failover Patterns
Domains	Distributed_Systems, Fault_Tolerance
Last Updated	2026-02-10 00:00 GMT

Overview

A coordinated failover initiation process that identifies affected workflows and tasks when a node fails, and creates recovery commands to resume execution on healthy nodes.

Description

The Failover Process Initiation principle defines how DolphinScheduler initiates recovery after a node failure. FailoverCoordinator responds to master and worker failure events: for master failures, it finds all workflow instances running on the crashed master and calls WorkflowFailover.failoverWorkflow() for each, which creates a Command with CommandType.RECOVER_TOLERANCE_FAULT_PROCESS in the command table. For worker failures, it identifies tasks running on the crashed worker and delegates to TaskFailover. A registry-based failover marker prevents duplicate failover processing.

Usage

Failover is automatically triggered by cluster change listeners when a node failure is detected. No manual intervention is required for standard failover scenarios.

Theoretical Basis

The failover follows a Command-based Recovery Pattern:

Detection: Cluster listener notifies FailoverCoordinator of node failure
Identification: Query database for workflows/tasks on the failed node
Command Creation: Insert recovery commands with WorkflowFailoverCommandParam
Execution: CommandEngine picks up recovery commands and re-processes workflows
Idempotency: Registry markers prevent duplicate failover

failoverMaster(event):
    workflows = findWorkflowsOnMaster(event.masterAddress)
    for workflow in workflows:
        WorkflowFailover.failoverWorkflow(workflow)
        // Creates RECOVER_TOLERANCE_FAULT_PROCESS command

failoverWorker(event):
    tasks = findTasksOnWorker(event.workerAddress)
    for task in tasks:
        TaskFailover.failoverTask(task)
        // Re-dispatches to healthy worker

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment