Implementation:Apache Spark Release Build Package
| Knowledge Sources | |
|---|---|
| Domains | Release_Engineering |
| Type | API Doc |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
Release build script that creates signed source and binary distribution tarballs for Apache Spark release candidates.
Description
release-build.sh package creates source tarballs, binary distributions for each variant (via make_binary_release), PySpark pip packages, and SparkR CRAN packages. Each artifact is signed with GPG and checksummed with SHA-512. The make_binary_release function accepts a distribution name, Maven flags, build package flags (withpip, withr, withconnect), and Scala version.
The package build process operates in the following sequence:
- Source tarball: Creates a clean source tarball from the tagged git checkout, excluding unnecessary files (e.g.,
.git, build artifacts). - Binary distributions: Iterates through the build matrix, calling
make_binary_releasefor each variant. Each invocation compiles Spark with specific Maven profiles and build flags. - PySpark package: Builds the PySpark pip-installable package for PyPI distribution.
- SparkR package: Builds the SparkR CRAN-compatible package.
- Signing and checksumming: Each produced artifact is signed with the release manager's GPG key (producing
.ascfiles) and checksummed with SHA-512 (producing.sha512files).
The make_binary_release function is the core of the build matrix expansion. It receives four parameters that fully specify a build variant and produces a self-contained binary distribution tarball.
Usage
Run as part of the release process after tagging. Typically invoked automatically by do-release-docker.sh but can be called directly for debugging.
Code Reference
Source Location
- Repository: apache/spark
- File:
dev/create-release/release-build.sh(lines 648-829)
Signature
# Main entry point
dev/create-release/release-build.sh package
# Core function for building a single binary variant
make_binary_release(NAME, FLAGS, BUILD_PACKAGE, SCALA_VERSION)
make_binary_release Parameters
| Parameter | Type | Description | Example |
|---|---|---|---|
NAME |
string | Distribution variant name | hadoop3
|
FLAGS |
string | Maven profile flags | -Phadoop-3
|
BUILD_PACKAGE |
string | Comma-separated build package flags | withpip,withr,withconnect
|
SCALA_VERSION |
string | Scala version for the build | 2.13
|
Build Package Flags
| Flag | Description |
|---|---|
withpip |
Include PySpark pip package in the distribution |
withr |
Include SparkR CRAN package in the distribution |
withconnect |
Include Spark Connect support in the distribution |
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| Tagged Spark source | git checkout | Yes | Source code checked out at the release tag |
| Maven profiles | build configuration | Yes | Profile flags for each build variant |
| GPG key | GPG keyring | Yes | Release manager's key for artifact signing |
RELEASE_VERSION |
environment variable | Yes | Version string for naming artifacts |
Outputs
| Name | Type | Description |
|---|---|---|
| Source tarball | spark-<ver>.tgz |
Complete source code archive |
| Binary tarballs | spark-<ver>-bin-<variant>.tgz |
Pre-built distribution per build matrix variant |
| GPG signatures | .asc files |
GPG signature for each artifact |
| SHA-512 checksums | .sha512 files |
Checksum for each artifact |
| PySpark package | .tar.gz / .whl |
Pip-installable PySpark package |
| SparkR package | .tar.gz |
CRAN-compatible SparkR package |
Usage Examples
Full Package Build
# Build all distribution variants (runs with env vars set by do-release-docker.sh)
dev/create-release/release-build.sh package
What Happens Internally
# The script internally calls make_binary_release for each variant, for example:
make_binary_release "hadoop3" "-Phadoop-3" "withpip,withr,withconnect" "2.13"
# Each call produces:
# spark-3.5.0-bin-hadoop3.tgz
# spark-3.5.0-bin-hadoop3.tgz.asc
# spark-3.5.0-bin-hadoop3.tgz.sha512