Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Apache Dolphinscheduler Node Failure Detection

From Leeroopedia


Knowledge Sources
Domains Distributed_Systems, Fault_Tolerance
Last Updated 2026-02-10 00:00 GMT

Overview

A heartbeat-based failure detection mechanism that detects node crashes through registry event subscriptions and triggers failover processes for affected workflows and tasks.

Description

The Node Failure Detection principle uses a service registry's session-based heartbeat mechanism to detect when nodes become unavailable. When a master or worker node stops sending heartbeats (due to crash, network partition, or graceful shutdown), the registry fires a REMOVE event. The AbstractClusterSubscribeListener receives this event, parses the node metadata from the heartbeat JSON, and invokes onServerRemove() on the appropriate cluster object (MasterClusters or WorkerClusters). All registered IClustersChangeListener instances are then notified, triggering failover procedures.

Usage

Failure detection is automatic and requires no application-level configuration. The registry client subscription is set up during cluster initialization.

Theoretical Basis

The detection follows the Unreliable Failure Detector model:

  • Heartbeat: Nodes periodically register heartbeats with the registry
  • Session Timeout: Registry detects absence of heartbeat after configurable timeout
  • REMOVE Event: Fired when the node's session expires
  • Listener Notification: All registered listeners are notified of the failure
// Detection flow
notify(event):
    if event.type == REMOVE:
        metadata = parseServerFromHeartbeat(event.data)
        onServerRemove(metadata)
        for listener in changeListeners:
            listener.onServerRemove(metadata)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment