Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Spark Release Build Package

From Leeroopedia


Knowledge Sources
Domains Release_Engineering
Type API Doc
Last Updated 2026-02-08 12:00 GMT

Overview

Release build script that creates signed source and binary distribution tarballs for Apache Spark release candidates.

Description

release-build.sh package creates source tarballs, binary distributions for each variant (via make_binary_release), PySpark pip packages, and SparkR CRAN packages. Each artifact is signed with GPG and checksummed with SHA-512. The make_binary_release function accepts a distribution name, Maven flags, build package flags (withpip, withr, withconnect), and Scala version.

The package build process operates in the following sequence:

  1. Source tarball: Creates a clean source tarball from the tagged git checkout, excluding unnecessary files (e.g., .git, build artifacts).
  2. Binary distributions: Iterates through the build matrix, calling make_binary_release for each variant. Each invocation compiles Spark with specific Maven profiles and build flags.
  3. PySpark package: Builds the PySpark pip-installable package for PyPI distribution.
  4. SparkR package: Builds the SparkR CRAN-compatible package.
  5. Signing and checksumming: Each produced artifact is signed with the release manager's GPG key (producing .asc files) and checksummed with SHA-512 (producing .sha512 files).

The make_binary_release function is the core of the build matrix expansion. It receives four parameters that fully specify a build variant and produces a self-contained binary distribution tarball.

Usage

Run as part of the release process after tagging. Typically invoked automatically by do-release-docker.sh but can be called directly for debugging.

Code Reference

Source Location

  • Repository: apache/spark
  • File: dev/create-release/release-build.sh (lines 648-829)

Signature

# Main entry point
dev/create-release/release-build.sh package

# Core function for building a single binary variant
make_binary_release(NAME, FLAGS, BUILD_PACKAGE, SCALA_VERSION)

make_binary_release Parameters

Parameter Type Description Example
NAME string Distribution variant name hadoop3
FLAGS string Maven profile flags -Phadoop-3
BUILD_PACKAGE string Comma-separated build package flags withpip,withr,withconnect
SCALA_VERSION string Scala version for the build 2.13

Build Package Flags

Flag Description
withpip Include PySpark pip package in the distribution
withr Include SparkR CRAN package in the distribution
withconnect Include Spark Connect support in the distribution

I/O Contract

Inputs

Name Type Required Description
Tagged Spark source git checkout Yes Source code checked out at the release tag
Maven profiles build configuration Yes Profile flags for each build variant
GPG key GPG keyring Yes Release manager's key for artifact signing
RELEASE_VERSION environment variable Yes Version string for naming artifacts

Outputs

Name Type Description
Source tarball spark-<ver>.tgz Complete source code archive
Binary tarballs spark-<ver>-bin-<variant>.tgz Pre-built distribution per build matrix variant
GPG signatures .asc files GPG signature for each artifact
SHA-512 checksums .sha512 files Checksum for each artifact
PySpark package .tar.gz / .whl Pip-installable PySpark package
SparkR package .tar.gz CRAN-compatible SparkR package

Usage Examples

Full Package Build

# Build all distribution variants (runs with env vars set by do-release-docker.sh)
dev/create-release/release-build.sh package

What Happens Internally

# The script internally calls make_binary_release for each variant, for example:
make_binary_release "hadoop3" "-Phadoop-3" "withpip,withr,withconnect" "2.13"

# Each call produces:
#   spark-3.5.0-bin-hadoop3.tgz
#   spark-3.5.0-bin-hadoop3.tgz.asc
#   spark-3.5.0-bin-hadoop3.tgz.sha512

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment