Environment:Apache Spark JDK Build Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Build_System |
| Last Updated | 2026-02-08 22:00 GMT |
Overview
Linux/macOS build environment requiring JDK 17+ (minimum 17.0.11), Maven 3.9.12, and Scala 2.13.18 for compiling Apache Spark from source.
Description
This environment defines the toolchain required to build Apache Spark from source using the Maven-based build system. The build/mvn wrapper script automatically detects and downloads the correct Maven version, but requires a properly configured JDK installation. Spark 4.x mandates Java 17 as the minimum compiler version (up from Java 8 in Spark 3.x). The build compiles Scala 2.13 code exclusively, as Scala 2.12 support was removed in Spark 4.0.0. Compilation requires significant JVM heap space (4GB) configured via MAVEN_OPTS.
Usage
Use this environment for any Building and Testing workflow, including compiling Spark from source, running the test suite, and creating binary distributions. It is the mandatory prerequisite for running the Build_Mvn, Mvn_Clean_Package, Run_Tests, Make_Distribution, and Python_Run_Tests implementations.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux or macOS | Windows via WSL2; macOS uses /usr/libexec/java_home for JDK detection |
| Hardware | CPU with sufficient cores | 8-16 cores recommended for parallel compilation |
| Memory | Minimum 4GB free RAM | MAVEN_OPTS defaults to -Xmx4g for compilation |
| Disk | 20GB+ free space | Source + compiled artifacts + Maven cache |
Dependencies
System Packages
- JDK 17 (minimum 17.0.11) or JDK 21
- Maven 3.9.12 (auto-downloaded by build/mvn if missing)
- Scala 2.13.18 (managed by Maven, no separate install needed)
- `git` (for source checkout)
- `tar`, `gzip` (for Maven auto-download)
- `curl` or `wget` (for Maven auto-download)
- `shasum` (optional, for checksum verification)
Key Library Versions (from pom.xml)
- Hadoop 3.4.2
- Kafka 3.9.1
- Hive 2.3.10
- Apache Arrow 18.3.0
- Protobuf 4.33.5
- Jackson 2.21.0
Credentials
No credentials required for building from source. For running tests that access external services, see individual test configurations.
Quick Install
# Install JDK 17 (Ubuntu/Debian)
sudo apt-get install openjdk-17-jdk
# Maven is auto-downloaded by build/mvn wrapper
# Just ensure JAVA_HOME is set:
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
# Build Spark
./build/mvn -DskipTests clean package
Code Evidence
Java and Maven version requirements from `pom.xml:120-123`:
<java.version>17</java.version>
<java.minimum.version>17.0.11</java.minimum.version>
<maven.compiler.release>${java.version}</maven.compiler.release>
<maven.version>3.9.12</maven.version>
Scala version pinned at `pom.xml:175-176`:
<scala.version>2.13.18</scala.version>
<scala.binary.version>2.13</scala.binary.version>
JAVA_HOME auto-detection from `build/mvn:23-32`:
if [ -z "${JAVA_HOME}" -a "$(command -v javac)" ]; then
if [ "$(uname -s)" = "Darwin" ]; then
export JAVA_HOME="$(/usr/libexec/java_home)"
else
export JAVA_HOME="$(dirname $(dirname $(realpath $(command -v javac))))"
fi
fi
Maven version enforcement from `build/mvn:116-126`:
local MVN_VERSION=`grep "<maven.version>" "${_DIR}/../pom.xml" | head -n1 | awk -F '[<>]' '{print $3}'`
MVN_BIN="${_DIR}/apache-maven-${MVN_VERSION}/bin/mvn"
# ...
if [ $(version $MVN_DETECTED_VERSION) -ne $(version $MVN_VERSION) ]; then
# Auto-download correct Maven version
fi
Compilation JVM options from `build/mvn:39`:
_COMPILE_JVM_OPTS="-Xss128m -Xmx4g -XX:ReservedCodeCacheSize=128m"
Java version check for Spark 4.0+ from `dev/create-release/release-build.sh:622-626`:
elif [[ $JAVA_VERSION < "17.0." ]] && [[ $SPARK_VERSION > "3.5.99" ]]; then
echo "Java version $JAVA_VERSION is less than required 17 for 4.0+"
echo "Please set JAVA_HOME correctly."
exit 1
fi
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `JAVA_HOME is not set` | JDK not installed or JAVA_HOME not configured | Install JDK 17+ and set `export JAVA_HOME=/path/to/jdk` |
| `Java version X is less than required 17 for 4.0+` | Using JDK < 17 with Spark 4.x | Upgrade to JDK 17 or 21 |
| `Cannot download with cURL or wget` | Neither download tool is installed | Install curl: `sudo apt-get install curl` |
| `Bad checksum` | Corrupted Maven download | Delete `build/apache-maven-*` and retry |
| Maven OutOfMemoryError during compilation | Insufficient heap for compilation | Set `export MAVEN_OPTS="-Xss128m -Xmx4g -XX:ReservedCodeCacheSize=128m"` |
Compatibility Notes
- macOS: Uses `/usr/libexec/java_home` for JDK detection. The naive path resolution from `/usr/bin/javac` would yield `/usr` incorrectly on some macOS versions.
- Linux: Resolves JDK path via `realpath $(command -v javac)` going two directories up.
- Scala 2.12: Removed in Spark 4.0.0. Only Scala 2.13 is supported.
- JDK 21: Fully supported as of Spark 4.x; Docker images default to Zulu OpenJDK 21.