Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Apache Spark Build Fallback Strategies

From Leeroopedia



Knowledge Sources
Domains Build_System, Debugging, Infrastructure
Last Updated 2026-02-08 22:00 GMT

Overview

Build system defensive patterns: download tool fallback (curl then wget), Maven mirror fallback to archive.apache.org, SHA512 checksum verification when possible, and platform-specific JAVA_HOME detection.

Description

The Spark build system employs multiple layers of defensive coding to handle the diversity of developer environments. The build/mvn wrapper script implements a chain of fallback strategies for downloading Maven, detects JAVA_HOME differently on macOS vs Linux, auto-downloads the correct Maven version when the system version doesn't match, and performs checksum verification when the `shasum` utility is available. These patterns ensure that builds succeed across a wide range of environments without manual intervention.

Usage

Use these patterns as reference when debugging build failures related to Maven download issues, setting up new build environments, or understanding why builds behave differently on macOS vs Linux. Also valuable as a template for implementing robust download logic in CI/CD pipelines.

The Insight (Rule of Thumb)

  • Download Fallback: Try curl first, then wget. If both fail, exit with a clear error message explaining what to install manually.
  • Mirror Fallback: If the primary Apache mirror doesn't have the file (common for older Maven versions), fall back to archive.apache.org.
  • Checksum Verification: Verify SHA512 checksums when shasum is available. Skip gracefully (with a warning) when it's not. Never skip the download itself because the verification tool is missing.
  • JAVA_HOME Detection: On macOS, use `/usr/libexec/java_home` (handles multiple JDK installations correctly). On Linux, resolve the realpath of `javac` and go up two directories.
  • Maven Version Pinning: Read the required version from pom.xml, compare with the system Maven, and auto-download the correct version if they differ.
  • Compilation JVM Options: Always set `-Xss128m -Xmx4g -XX:ReservedCodeCacheSize=128m` for compilation to handle deep recursion and large codebases.
  • SBT Interactive Prompt: When running SBT in automated environments, pipe `echo "q"` to prevent interactive prompts from blocking on build failures.

Reasoning

Download Fallback: Different Linux distributions ship with different default HTTP clients. Minimal Docker images often have neither curl nor wget. The ordered fallback ensures maximum compatibility while preferring curl (which is more commonly available and has better error handling).

Mirror Fallback: Apache's mirror system only serves the latest versions of projects. Older Maven versions (which may still be specified in pom.xml during version transitions) are only available from the archive server. Without this fallback, builds would fail silently on version transitions.

macOS JAVA_HOME: On some macOS versions, `/usr/bin/javac` is a real file (not a symlink), so naive path resolution via `realpath` yields `/usr` as JAVA_HOME, which is incorrect. The `/usr/libexec/java_home` utility is Apple's official mechanism for finding the active JDK and handles all edge cases correctly.

Maven Version Pinning: Different Maven versions can produce different build results (dependency resolution, plugin behavior). Pinning to the exact version from pom.xml ensures reproducible builds across all developer machines and CI environments.

Code Evidence

Download fallback chain from `build/mvn:64-108`:

if [ ! -f "${local_tarball}" -a "$(command -v curl)" ]; then
  curl ${curl_opts} "${remote_tarball}" > "${local_tarball}"
fi
# if the file still doesn't exist, lets try `wget` and cross our fingers
if [ ! -f "${local_tarball}" -a "$(command -v wget)" ]; then
  wget ${wget_opts} -O "${local_tarball}" "${remote_tarball}"
fi
if [ ! -f "${local_tarball}" ]; then
  echo -n "ERROR: Cannot download with cURL or wget; please install manually"
  exit 2
fi
if [ "$(command -v shasum)" ]; then
  if [ -f "${local_checksum}" ]; then
    shasum -a 512 -c "${local_checksum}"
  fi
else
  echo "Skipping checksum because shasum is not installed."
fi

Mirror fallback from `build/mvn:132-139`:

if ! curl -L --output /dev/null --silent --head --fail "${APACHE_MIRROR}/${FILE_PATH}${MIRROR_URL_QUERY}" ; then
  echo "Falling back to archive.apache.org to download Maven"
  APACHE_MIRROR="https://archive.apache.org/dist"
  MIRROR_URL_QUERY=""
fi

Platform-specific JAVA_HOME detection from `build/mvn:23-32`:

if [ -z "${JAVA_HOME}" -a "$(command -v javac)" ]; then
  if [ "$(uname -s)" = "Darwin" ]; then
    # macOS: /usr/bin/javac may be a real file, not a symlink
    export JAVA_HOME="$(/usr/libexec/java_home)"
  else
    export JAVA_HOME="$(dirname $(dirname $(realpath $(command -v javac))))"
  fi
fi

SBT interactive prompt workaround from `dev/run-tests.py:158-161`:

# NOTE: echo "q" is needed because sbt on encountering a build file
# with failure prompts the user for input either q, r, etc to quit or retry.
# This echo is there to make it not block.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment