Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Apache Spark Worker Fleet Management

From Leeroopedia


Field Value
Domains Deployment, Distributed_Systems
Type Principle
Related Implementation:Apache_Spark_Start_Workers

Overview

A distributed daemon coordination pattern that starts worker processes across multiple machines in a cluster using SSH-based remote execution and host inventory files.

Description

After the master is running, worker daemons must be started on all designated machines. Fleet management automates this process by:

  • Reading a host inventory file (conf/workers) to determine which machines should run workers
  • SSH-ing to each host from the master node to execute the worker startup script
  • Supporting multiple worker instances per machine for NUMA-aware deployments
  • Centralizing cluster operations on the master node for simplified management

The fan-out execution pattern allows a single command on the master to bring up the entire worker fleet. Each worker independently registers with the master using the spark:// URL, forming the distributed compute layer.

Key properties of this pattern:

  • Inventory-driven -- the conf/workers file is the single source of truth for cluster membership
  • Parallel execution -- SSH commands are issued to all workers concurrently for fast startup
  • Instance multiplicity -- SPARK_WORKER_INSTANCES allows running multiple workers per physical host to exploit NUMA topology
  • Centralized control -- all fleet operations are initiated from the master node

Usage

Use after starting the master to bring up all worker nodes. This pattern also supports:

  • Individual worker startup -- starting a worker on a single machine for testing or incremental scaling
  • Multi-instance deployment -- running multiple worker JVMs on high-resource machines
  • Rolling restarts -- restarting workers one at a time for zero-downtime maintenance

Theoretical Basis

The pattern follows a fan-out execution model:

read_hosts(inventory) -> for each host: ssh(host, start_worker(master_url))

For multiple instances per host:

for i in 1..INSTANCES: start_worker_instance(i)
Component Role
conf/workers Host inventory listing all worker machines
SSH Remote execution channel from master to workers
master_url spark://host:port address workers use to register
SPARK_WORKER_INSTANCES Controls per-host worker multiplicity

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment