Implementation:Apache Spark Release Build Finalize
| Knowledge Sources | |
|---|---|
| Domains | Release_Engineering |
| Type | API Doc |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
Release build script sub-command and Python utilities that finalize an Apache Spark release after a successful vote.
Description
release-build.sh finalize promotes the Nexus staging repository, uploads PySpark to PyPI via twine, moves artifacts from dev to release SVN, and deploys documentation. generate-contributors.py creates contributor lists by diffing commits between release tags and mapping authors to GitHub usernames. generate-llms-txt.py creates an LLM-friendly documentation index.
The finalization process consists of the following operations:
- Nexus promotion: The staging repository that received a passing vote is promoted (released) via the Nexus REST API, which triggers synchronization to Maven Central.
- PyPI upload: PySpark packages are uploaded to PyPI using
twine, making them available viapip install pyspark. - SVN migration: Binary distribution artifacts are moved from the Apache
devSVN area to thereleaseSVN area, making them available on Apache mirrors. - Documentation deployment: The generated documentation site is deployed to the official URL.
- Final git tag: The release candidate tag is re-tagged as the final release tag (e.g.,
v3.5.0-rc1becomesv3.5.0). - RC cleanup: Previous release candidate directories are removed from SVN staging.
Contributor Generation
The generate-contributors.py script produces attribution lists by:
- Extracting all commits between
PREVIOUS_RELEASE_TAGandRELEASE_TAGusing git log. - Parsing commit metadata to extract author names and pull request numbers.
- Using the GitHub API (via optional
GITHUB_OAUTH_KEY) to map git author names to GitHub usernames. - Producing a formatted contributor list suitable for release notes.
The releaseutils.py module provides the underlying data model and API interaction:
| Component | Description |
|---|---|
Commit class |
Data class with fields: hash, github_username, title, pr_number
|
get_commits(tag) |
Extracts commits between two tags from git history |
get_github_name(author) |
Maps a git author name to a GitHub username via the GitHub API |
Usage
Run after the RC vote passes, with PMC confirmation. This is the final step that makes the release publicly available.
Code Reference
Source Location
- Repository: apache/spark
- Files:
dev/create-release/release-build.sh(lines 101-574)dev/create-release/generate-contributors.py(lines 1-190)dev/create-release/generate-llms-txt.py(lines 1-199)dev/create-release/releaseutils.py(lines 1-138)
Signature
# Finalize the release
dev/create-release/release-build.sh finalize
# Generate contributor list
python3 dev/create-release/generate-contributors.py
# Generate LLM documentation index
python3 dev/create-release/generate-llms-txt.py
generate-contributors.py Environment Variables
| Variable | Required | Description |
|---|---|---|
RELEASE_TAG |
Yes | Tag for the current release (e.g., v3.5.0)
|
PREVIOUS_RELEASE_TAG |
Yes | Tag for the previous release (e.g., v3.4.0)
|
GITHUB_OAUTH_KEY |
No | GitHub OAuth token for API rate limit increase |
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| Passed RC vote | community consensus | Yes | PMC confirmation that the RC vote passed |
| Built artifacts | release artifacts | Yes | All artifacts from the package and publish steps |
| PyPI credentials | environment variables | Yes | Credentials for uploading PySpark to PyPI via twine |
| GitHub OAuth key | environment variable | No | Optional token for GitHub API access (contributor generation) |
| Nexus staging repo ID | string | Yes | ID of the staging repository to promote |
Outputs
| Name | Type | Description |
|---|---|---|
| Final git tag | git tag | Release tag (e.g., v3.5.0) in the Apache repository
|
| PySpark on PyPI | PyPI package | PySpark package published and installable via pip install pyspark
|
| Published docs | website | Documentation site deployed to the official Apache Spark URL |
| Maven Central artifacts | Maven repository | JARs, POMs available on Maven Central after Nexus sync |
contributors.txt |
text file | Formatted list of contributors between releases |
llms.txt |
text file | LLM-friendly documentation index |
Usage Examples
Finalize Release
# Finalize the release after successful vote
dev/create-release/release-build.sh finalize
Generate Contributors
# Generate contributor list between two releases
RELEASE_TAG=v3.5.0 \
PREVIOUS_RELEASE_TAG=v3.4.0 \
GITHUB_OAUTH_KEY=ghp_xxxxxxxxxxxx \
python3 dev/create-release/generate-contributors.py
Generate LLM Documentation Index
# Generate llms.txt for the release documentation
python3 dev/create-release/generate-llms-txt.py