Principle:Apache Spark Worker Fleet Management
| Field | Value |
|---|---|
| Domains | Deployment, Distributed_Systems |
| Type | Principle |
| Related | Implementation:Apache_Spark_Start_Workers |
Overview
A distributed daemon coordination pattern that starts worker processes across multiple machines in a cluster using SSH-based remote execution and host inventory files.
Description
After the master is running, worker daemons must be started on all designated machines. Fleet management automates this process by:
- Reading a host inventory file (conf/workers) to determine which machines should run workers
- SSH-ing to each host from the master node to execute the worker startup script
- Supporting multiple worker instances per machine for NUMA-aware deployments
- Centralizing cluster operations on the master node for simplified management
The fan-out execution pattern allows a single command on the master to bring up the entire worker fleet. Each worker independently registers with the master using the spark:// URL, forming the distributed compute layer.
Key properties of this pattern:
- Inventory-driven -- the conf/workers file is the single source of truth for cluster membership
- Parallel execution -- SSH commands are issued to all workers concurrently for fast startup
- Instance multiplicity -- SPARK_WORKER_INSTANCES allows running multiple workers per physical host to exploit NUMA topology
- Centralized control -- all fleet operations are initiated from the master node
Usage
Use after starting the master to bring up all worker nodes. This pattern also supports:
- Individual worker startup -- starting a worker on a single machine for testing or incremental scaling
- Multi-instance deployment -- running multiple worker JVMs on high-resource machines
- Rolling restarts -- restarting workers one at a time for zero-downtime maintenance
Theoretical Basis
The pattern follows a fan-out execution model:
read_hosts(inventory) -> for each host: ssh(host, start_worker(master_url))
For multiple instances per host:
for i in 1..INSTANCES: start_worker_instance(i)
| Component | Role |
|---|---|
| conf/workers | Host inventory listing all worker machines |
| SSH | Remote execution channel from master to workers |
| master_url | spark://host:port address workers use to register |
| SPARK_WORKER_INSTANCES | Controls per-host worker multiplicity |