Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Apache Spark Cluster Lifecycle Shutdown

From Leeroopedia


Field Value
Domains Deployment, Distributed_Systems
Type Principle
Related Implementation:Apache_Spark_Stop_All

Overview

An ordered shutdown pattern that gracefully terminates distributed daemon processes in dependency-reverse order with optional blocking for confirmation.

Description

Shutting down a distributed cluster requires stopping processes in the correct order to prevent orphaned processes and data loss. The shutdown pattern enforces:

  • Reverse-dependency ordering -- workers must stop before the master to ensure graceful deregistration
  • Graceful decommissioning -- allows in-flight tasks to complete before termination
  • Signal-based control -- different signals trigger different shutdown behaviors (SIGTERM for immediate stop, SIGPWR for graceful decommission)
  • Blocking confirmation -- optional wait flag blocks until all processes have fully terminated

The two shutdown strategies serve different operational needs:

  • Immediate shutdown (SIGTERM) -- stops daemons promptly, suitable for maintenance windows
  • Graceful decommission (SIGPWR) -- allows running tasks to finish, suitable for rolling upgrades and capacity reduction

Usage

Use to stop a running standalone cluster. Choose the appropriate strategy based on operational requirements:

  • Full cluster shutdown -- use stop-all.sh to stop all workers then the master
  • Rolling maintenance -- use decommission-worker.sh on individual workers to drain tasks before stopping
  • Emergency stop -- use stop-all.sh for immediate termination of all daemons

Theoretical Basis

The shutdown follows reverse-dependency ordering:

stop(workers) -> wait(workers_down) -> stop(master)

For graceful decommissioning of individual workers:

send(SIGPWR) -> wait(tasks_complete) -> exit
Strategy Signal Behavior Use Case
Immediate SIGTERM Stop daemon promptly Maintenance windows
Graceful SIGPWR Drain tasks then stop Rolling upgrades
Blocking --wait flag Wait for full termination Scripted orchestration

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment