Workflow:Apache Spark Release Process

Knowledge Sources	Apache Spark Spark Downloads Building Spark
Domains	Release_Engineering, CI_CD, Build_Systems
Last Updated	2026-02-08 22:00 GMT

Overview

End-to-end process for creating an official Apache Spark release candidate, from tagging the source tree through building binary distributions to publishing artifacts.

Description

This workflow covers the complete Apache Spark release process, which is orchestrated through a Dockerized build environment for reproducibility. The process includes creating a release tag from a specified branch, building source and binary distributions with multiple Hadoop profiles, generating documentation, publishing Maven artifacts to a staging repository, and producing the final release artifacts. The release scripts support both full releases and dry-run modes for testing. A contributor list generator and LLMs.txt generator are also included as part of the release tooling.

Usage

Execute this workflow when preparing a new Apache Spark release candidate for community voting. This is performed by the designated release manager and typically follows the Apache Software Foundation release process. It can also be used in dry-run mode to validate the build before an official release attempt.

Execution Steps

Step 1: Release Environment Setup

Set up the Dockerized release environment using dev/create-release/do-release-docker.sh. This script builds a "spark-rm" Docker image containing all required build tools and dependencies, ensuring a reproducible and isolated build environment. The environment requires a configured working directory for output artifacts.

Key considerations:

The Docker image is rebuilt as needed on each invocation
A working directory must be specified with the -d flag
The -n flag enables dry-run mode for testing without uploads
Individual steps (tag, build, docs, publish, finalize) can be run separately with -s
Custom JDK paths can be specified with -j; defaults to OpenJDK 17

Step 2: Release Tagging

Create a git release tag from the specified branch using dev/create-release/release-tag.sh. This step updates version numbers in POM files and other configuration, commits the version changes, and creates a signed git tag for the release candidate.

Key considerations:

The tag follows the pattern vX.Y.Z-rcN
Version numbers are updated across all POM files
The tag is pushed to the ASF git repository
Shared utility functions from release-util.sh handle configuration

Step 3: Binary Distribution Build

Build the source and binary distributions using dev/create-release/release-build.sh. This is the largest and most complex step, producing multiple distribution artifacts: source tarball, binary tarballs for different Hadoop versions, PySpark pip package, and SparkR CRAN package.

Key considerations:

Builds are executed with Maven using multiple Hadoop profiles
Binary distributions are created via dev/make-distribution.sh
The PySpark pip package is built and included
Source distributions include the complete source tree with ASF licensing
GPG signatures and SHA checksums are generated for all artifacts

Step 4: Documentation Generation

Build the complete Spark documentation site, including API documentation for Scala, Java, Python, and R. The documentation is generated using Jekyll with custom plugins for code examples, API doc cross-references, and version-specific content.

Key considerations:

Jekyll-based site generation with custom Ruby plugins
API docs generated for all four supported languages
SQL documentation generated from Spark SQL function metadata
Error documentation auto-generated from error class definitions
Documentation is versioned per release

Step 5: Artifact Publishing

Publish Maven artifacts to the ASF staging repository and upload binary distributions to the ASF SVN distribution area. This step makes the release candidate available for community voting and verification.

Key considerations:

Maven artifacts are published to a staging repository
Binary distributions are uploaded to SVN
Contributor list is generated between the previous release and this tag
The finalize step closes the staging repository
Dry-run mode skips all remote uploads

Step 6: Release Finalization

Close the Maven staging repository and prepare the release vote email. The contributor list generator (generate-contributors.py) produces a list of all contributors between releases. The release artifacts are now ready for community voting.

Key considerations:

The staging repository is closed to prevent further modifications
Contributor list covers all commits between the previous and current release
The LLMs.txt generator creates a machine-readable project summary
Community voting follows ASF governance procedures
After a successful vote, artifacts are promoted to the release area

Execution Diagram

GitHub URL

Workflow Repository