Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Apache Spark JDK Build Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Build_System
Last Updated 2026-02-08 22:00 GMT

Overview

Linux/macOS build environment requiring JDK 17+ (minimum 17.0.11), Maven 3.9.12, and Scala 2.13.18 for compiling Apache Spark from source.

Description

This environment defines the toolchain required to build Apache Spark from source using the Maven-based build system. The build/mvn wrapper script automatically detects and downloads the correct Maven version, but requires a properly configured JDK installation. Spark 4.x mandates Java 17 as the minimum compiler version (up from Java 8 in Spark 3.x). The build compiles Scala 2.13 code exclusively, as Scala 2.12 support was removed in Spark 4.0.0. Compilation requires significant JVM heap space (4GB) configured via MAVEN_OPTS.

Usage

Use this environment for any Building and Testing workflow, including compiling Spark from source, running the test suite, and creating binary distributions. It is the mandatory prerequisite for running the Build_Mvn, Mvn_Clean_Package, Run_Tests, Make_Distribution, and Python_Run_Tests implementations.

System Requirements

Category Requirement Notes
OS Linux or macOS Windows via WSL2; macOS uses /usr/libexec/java_home for JDK detection
Hardware CPU with sufficient cores 8-16 cores recommended for parallel compilation
Memory Minimum 4GB free RAM MAVEN_OPTS defaults to -Xmx4g for compilation
Disk 20GB+ free space Source + compiled artifacts + Maven cache

Dependencies

System Packages

  • JDK 17 (minimum 17.0.11) or JDK 21
  • Maven 3.9.12 (auto-downloaded by build/mvn if missing)
  • Scala 2.13.18 (managed by Maven, no separate install needed)
  • `git` (for source checkout)
  • `tar`, `gzip` (for Maven auto-download)
  • `curl` or `wget` (for Maven auto-download)
  • `shasum` (optional, for checksum verification)

Key Library Versions (from pom.xml)

  • Hadoop 3.4.2
  • Kafka 3.9.1
  • Hive 2.3.10
  • Apache Arrow 18.3.0
  • Protobuf 4.33.5
  • Jackson 2.21.0

Credentials

No credentials required for building from source. For running tests that access external services, see individual test configurations.

Quick Install

# Install JDK 17 (Ubuntu/Debian)
sudo apt-get install openjdk-17-jdk

# Maven is auto-downloaded by build/mvn wrapper
# Just ensure JAVA_HOME is set:
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64

# Build Spark
./build/mvn -DskipTests clean package

Code Evidence

Java and Maven version requirements from `pom.xml:120-123`:

<java.version>17</java.version>
<java.minimum.version>17.0.11</java.minimum.version>
<maven.compiler.release>${java.version}</maven.compiler.release>
<maven.version>3.9.12</maven.version>

Scala version pinned at `pom.xml:175-176`:

<scala.version>2.13.18</scala.version>
<scala.binary.version>2.13</scala.binary.version>

JAVA_HOME auto-detection from `build/mvn:23-32`:

if [ -z "${JAVA_HOME}" -a "$(command -v javac)" ]; then
  if [ "$(uname -s)" = "Darwin" ]; then
    export JAVA_HOME="$(/usr/libexec/java_home)"
  else
    export JAVA_HOME="$(dirname $(dirname $(realpath $(command -v javac))))"
  fi
fi

Maven version enforcement from `build/mvn:116-126`:

local MVN_VERSION=`grep "<maven.version>" "${_DIR}/../pom.xml" | head -n1 | awk -F '[<>]' '{print $3}'`
MVN_BIN="${_DIR}/apache-maven-${MVN_VERSION}/bin/mvn"
# ...
if [ $(version $MVN_DETECTED_VERSION) -ne $(version $MVN_VERSION) ]; then
  # Auto-download correct Maven version
fi

Compilation JVM options from `build/mvn:39`:

_COMPILE_JVM_OPTS="-Xss128m -Xmx4g -XX:ReservedCodeCacheSize=128m"

Java version check for Spark 4.0+ from `dev/create-release/release-build.sh:622-626`:

elif [[ $JAVA_VERSION < "17.0." ]] && [[ $SPARK_VERSION > "3.5.99" ]]; then
  echo "Java version $JAVA_VERSION is less than required 17 for 4.0+"
  echo "Please set JAVA_HOME correctly."
  exit 1
fi

Common Errors

Error Message Cause Solution
`JAVA_HOME is not set` JDK not installed or JAVA_HOME not configured Install JDK 17+ and set `export JAVA_HOME=/path/to/jdk`
`Java version X is less than required 17 for 4.0+` Using JDK < 17 with Spark 4.x Upgrade to JDK 17 or 21
`Cannot download with cURL or wget` Neither download tool is installed Install curl: `sudo apt-get install curl`
`Bad checksum` Corrupted Maven download Delete `build/apache-maven-*` and retry
Maven OutOfMemoryError during compilation Insufficient heap for compilation Set `export MAVEN_OPTS="-Xss128m -Xmx4g -XX:ReservedCodeCacheSize=128m"`

Compatibility Notes

  • macOS: Uses `/usr/libexec/java_home` for JDK detection. The naive path resolution from `/usr/bin/javac` would yield `/usr` incorrectly on some macOS versions.
  • Linux: Resolves JDK path via `realpath $(command -v javac)` going two directories up.
  • Scala 2.12: Removed in Spark 4.0.0. Only Scala 2.13 is supported.
  • JDK 21: Fully supported as of Spark 4.x; Docker images default to Zulu OpenJDK 21.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment