Implementation:Apache Spark Start Workers
| Field | Value |
|---|---|
| Source | Repo Apache Spark |
| Domains | Deployment |
| Type | API Doc |
| Related | Principle:Apache_Spark_Worker_Fleet_Management |
Overview
Shell scripts that start Spark worker daemons on all configured cluster machines via SSH.
Description
Three scripts work together to start the worker fleet:
- sbin/start-workers.sh -- entry point that reads the master URL and delegates to sbin/workers.sh
- sbin/workers.sh -- iterates over hosts in conf/workers and SSHs to each one to execute the worker startup command
- sbin/start-worker.sh -- runs on each worker machine to start the local Worker JVM process, supporting multiple instances per machine via SPARK_WORKER_INSTANCES
Workers automatically register with the master upon startup, making their resources (CPU cores, memory) available for application execution.
Usage
Run on the master node after start-master.sh. Workers register with the master automatically. No additional configuration is needed if conf/workers and SSH keys are properly set up.
Code Reference
Source: sbin/start-workers.sh (L1-47), sbin/start-worker.sh (L1-93), sbin/workers.sh (L1-121)
Signatures:
# Start all workers (run on master node, no arguments needed)
sbin/start-workers.sh
# Start a single worker on the local machine
sbin/start-worker.sh <master-url>
Key environment variables:
| Variable | Default | Purpose |
|---|---|---|
| SPARK_WORKER_INSTANCES | 1 | Number of worker JVMs to start per machine |
| SPARK_WORKER_PORT | (random) | RPC port for the worker process |
| SPARK_WORKER_WEBUI_PORT | 8081 | HTTP port for the worker monitoring Web UI |
| SPARK_SSH_OPTS | -o StrictHostKeyChecking=no | SSH options for remote execution |
I/O
| Direction | Description |
|---|---|
| Inputs | Master URL (auto-detected from SPARK_MASTER_HOST:SPARK_MASTER_PORT), conf/workers host list, SSH keys |
| Outputs | Running Worker JVM processes on all hosts, Web UIs at http://<worker>:8081 |
Examples
Start all workers from the master node:
./sbin/start-workers.sh
Start a single worker on the local machine:
./sbin/start-worker.sh spark://master:7077
Start multiple worker instances per machine:
SPARK_WORKER_INSTANCES=2 ./sbin/start-worker.sh spark://master:7077