Heuristic:Apache Spark Build Fallback Strategies
| Knowledge Sources | |
|---|---|
| Domains | Build_System, Debugging, Infrastructure |
| Last Updated | 2026-02-08 22:00 GMT |
Overview
Build system defensive patterns: download tool fallback (curl then wget), Maven mirror fallback to archive.apache.org, SHA512 checksum verification when possible, and platform-specific JAVA_HOME detection.
Description
The Spark build system employs multiple layers of defensive coding to handle the diversity of developer environments. The build/mvn wrapper script implements a chain of fallback strategies for downloading Maven, detects JAVA_HOME differently on macOS vs Linux, auto-downloads the correct Maven version when the system version doesn't match, and performs checksum verification when the `shasum` utility is available. These patterns ensure that builds succeed across a wide range of environments without manual intervention.
Usage
Use these patterns as reference when debugging build failures related to Maven download issues, setting up new build environments, or understanding why builds behave differently on macOS vs Linux. Also valuable as a template for implementing robust download logic in CI/CD pipelines.
The Insight (Rule of Thumb)
- Download Fallback: Try curl first, then wget. If both fail, exit with a clear error message explaining what to install manually.
- Mirror Fallback: If the primary Apache mirror doesn't have the file (common for older Maven versions), fall back to archive.apache.org.
- Checksum Verification: Verify SHA512 checksums when shasum is available. Skip gracefully (with a warning) when it's not. Never skip the download itself because the verification tool is missing.
- JAVA_HOME Detection: On macOS, use `/usr/libexec/java_home` (handles multiple JDK installations correctly). On Linux, resolve the realpath of `javac` and go up two directories.
- Maven Version Pinning: Read the required version from pom.xml, compare with the system Maven, and auto-download the correct version if they differ.
- Compilation JVM Options: Always set `-Xss128m -Xmx4g -XX:ReservedCodeCacheSize=128m` for compilation to handle deep recursion and large codebases.
- SBT Interactive Prompt: When running SBT in automated environments, pipe `echo "q"` to prevent interactive prompts from blocking on build failures.
Reasoning
Download Fallback: Different Linux distributions ship with different default HTTP clients. Minimal Docker images often have neither curl nor wget. The ordered fallback ensures maximum compatibility while preferring curl (which is more commonly available and has better error handling).
Mirror Fallback: Apache's mirror system only serves the latest versions of projects. Older Maven versions (which may still be specified in pom.xml during version transitions) are only available from the archive server. Without this fallback, builds would fail silently on version transitions.
macOS JAVA_HOME: On some macOS versions, `/usr/bin/javac` is a real file (not a symlink), so naive path resolution via `realpath` yields `/usr` as JAVA_HOME, which is incorrect. The `/usr/libexec/java_home` utility is Apple's official mechanism for finding the active JDK and handles all edge cases correctly.
Maven Version Pinning: Different Maven versions can produce different build results (dependency resolution, plugin behavior). Pinning to the exact version from pom.xml ensures reproducible builds across all developer machines and CI environments.
Code Evidence
Download fallback chain from `build/mvn:64-108`:
if [ ! -f "${local_tarball}" -a "$(command -v curl)" ]; then
curl ${curl_opts} "${remote_tarball}" > "${local_tarball}"
fi
# if the file still doesn't exist, lets try `wget` and cross our fingers
if [ ! -f "${local_tarball}" -a "$(command -v wget)" ]; then
wget ${wget_opts} -O "${local_tarball}" "${remote_tarball}"
fi
if [ ! -f "${local_tarball}" ]; then
echo -n "ERROR: Cannot download with cURL or wget; please install manually"
exit 2
fi
if [ "$(command -v shasum)" ]; then
if [ -f "${local_checksum}" ]; then
shasum -a 512 -c "${local_checksum}"
fi
else
echo "Skipping checksum because shasum is not installed."
fi
Mirror fallback from `build/mvn:132-139`:
if ! curl -L --output /dev/null --silent --head --fail "${APACHE_MIRROR}/${FILE_PATH}${MIRROR_URL_QUERY}" ; then
echo "Falling back to archive.apache.org to download Maven"
APACHE_MIRROR="https://archive.apache.org/dist"
MIRROR_URL_QUERY=""
fi
Platform-specific JAVA_HOME detection from `build/mvn:23-32`:
if [ -z "${JAVA_HOME}" -a "$(command -v javac)" ]; then
if [ "$(uname -s)" = "Darwin" ]; then
# macOS: /usr/bin/javac may be a real file, not a symlink
export JAVA_HOME="$(/usr/libexec/java_home)"
else
export JAVA_HOME="$(dirname $(dirname $(realpath $(command -v javac))))"
fi
fi
SBT interactive prompt workaround from `dev/run-tests.py:158-161`:
# NOTE: echo "q" is needed because sbt on encountering a build file
# with failure prompts the user for input either q, r, etc to quit or retry.
# This echo is there to make it not block.