Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Apache Spark Hardware Provisioning Tips

From Leeroopedia





Knowledge Sources
Domains Infrastructure, Hardware_Provisioning
Last Updated 2026-02-08 22:00 GMT

Overview

Hardware provisioning heuristics: 4-8 disks per node without RAID, at most 75% of RAM to Spark, 10 Gigabit+ network, and 8-16 CPU cores per machine.

Description

Spark's distributed computing model has specific hardware requirements that differ from traditional database or web server deployments. The key insights relate to I/O parallelism (RAID hurts shuffle performance), memory partitioning (the OS and buffer cache need headroom), network bandwidth (often the real bottleneck once data is in memory), and CPU core density (Spark scales well with minimal thread contention).

Usage

Use these heuristics when provisioning new cluster hardware, sizing cloud instances for Spark workloads, or diagnosing performance bottlenecks in existing deployments. These guidelines apply to both standalone and Kubernetes deployments.

The Insight (Rule of Thumb)

  • Disks: Use 4-8 disks per node, mounted separately (NOT in RAID). Mount with `noatime` option. Set `spark.local.dir` to a comma-separated list of all mount points.
  • Memory: Allocate at most 75% of machine RAM to Spark executors. Leave the rest for the OS buffer cache and other processes.
  • JVM Limit: Never allocate more than 200GB to a single JVM. Launch multiple executors per node instead (standalone mode does this automatically).
  • Network: Use 10 Gigabit or higher network. Network is often the bottleneck for distributed reduce operations (groupBy, reduceBy, SQL joins) once data is in memory.
  • CPU: 8-16 cores per machine minimum. Spark scales well due to minimal inter-thread sharing.

Reasoning

Disks: RAID (especially RAID 5/6) reduces I/O parallelism for Spark's shuffle operations because the RAID controller serializes writes and adds parity computation overhead. Spark's shuffle already distributes data across multiple partitions, so independent disks provide better aggregate throughput. The `noatime` mount option eliminates unnecessary inode access time updates on every read, which can be significant for the small random reads in shuffle fetch.

Memory: The 75% rule exists because the OS buffer cache provides free read caching for shuffle files and input data. Starving the OS of memory forces it to evict cached pages, causing additional disk I/O that Spark's memory manager cannot compensate for.

Network: When dataset size exceeds available RAM and shuffle operations are required, network bandwidth becomes the primary bottleneck. A 10 Gigabit network can transfer ~1GB/s, which means a 100GB shuffle takes ~100 seconds in transit alone.

CPU: Spark's in-memory execution model means CPU is only the bottleneck after data is already cached in RAM. The recommendation for 8-16 cores per machine reflects the typical ratio of compute-to-memory in modern workloads.

Code Evidence

From `docs/hardware-provisioning.md:42-50`:

Recommendation: 4-8 disks per node, WITHOUT RAID
Mount separately, not as RAID array
Use noatime option in Linux to reduce unnecessary writes
Set spark.local.dir to comma-separated list of mount points

From `docs/hardware-provisioning.md:52-68`:

Recommendation: 8 GB to hundreds of GB per machine
Allocate at most 75% of memory to Spark (rest for OS/buffer cache)

Java VM behavior > 200 GB RAM:
  - Launch multiple executors per node instead of single large executor
  - Standalone mode: worker launches multiple executors automatically

From `docs/hardware-provisioning.md:70-76`:

Recommendation: 10 Gigabit or higher network
Critical for distributed reduce apps (group-by, reduce-by, SQL joins)
Check shuffle data volume in monitoring UI at http://<driver>:4040

From `docs/hardware-provisioning.md:78-83`:

Recommendation: At least 8-16 cores per machine
Spark scales well due to minimal thread sharing
More cores needed if CPU-bound after data is in memory

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment