Principle:Apache Spark Cluster Lifecycle Shutdown
| Field | Value |
|---|---|
| Domains | Deployment, Distributed_Systems |
| Type | Principle |
| Related | Implementation:Apache_Spark_Stop_All |
Overview
An ordered shutdown pattern that gracefully terminates distributed daemon processes in dependency-reverse order with optional blocking for confirmation.
Description
Shutting down a distributed cluster requires stopping processes in the correct order to prevent orphaned processes and data loss. The shutdown pattern enforces:
- Reverse-dependency ordering -- workers must stop before the master to ensure graceful deregistration
- Graceful decommissioning -- allows in-flight tasks to complete before termination
- Signal-based control -- different signals trigger different shutdown behaviors (SIGTERM for immediate stop, SIGPWR for graceful decommission)
- Blocking confirmation -- optional wait flag blocks until all processes have fully terminated
The two shutdown strategies serve different operational needs:
- Immediate shutdown (SIGTERM) -- stops daemons promptly, suitable for maintenance windows
- Graceful decommission (SIGPWR) -- allows running tasks to finish, suitable for rolling upgrades and capacity reduction
Usage
Use to stop a running standalone cluster. Choose the appropriate strategy based on operational requirements:
- Full cluster shutdown -- use stop-all.sh to stop all workers then the master
- Rolling maintenance -- use decommission-worker.sh on individual workers to drain tasks before stopping
- Emergency stop -- use stop-all.sh for immediate termination of all daemons
Theoretical Basis
The shutdown follows reverse-dependency ordering:
stop(workers) -> wait(workers_down) -> stop(master)
For graceful decommissioning of individual workers:
send(SIGPWR) -> wait(tasks_complete) -> exit
| Strategy | Signal | Behavior | Use Case |
|---|---|---|---|
| Immediate | SIGTERM | Stop daemon promptly | Maintenance windows |
| Graceful | SIGPWR | Drain tasks then stop | Rolling upgrades |
| Blocking | --wait flag | Wait for full termination | Scripted orchestration |