Principle:Apache Spark Release Finalization
| Domains | Release_Engineering, Community |
|---|---|
| Last Updated | 2026-02-08 12:00 GMT |
Overview
A post-vote release finalization process that promotes staged artifacts to release repositories, publishes packages, and generates community attribution reports.
Description
After a successful community vote, the release must be finalized: staging artifacts are promoted to release repositories, PySpark is published to PyPI, documentation is deployed, old release candidates are cleaned up, and contributor lists are generated from git history. The contributor generation tool uses the GitHub API to map git authors to GitHub usernames, creating proper attribution for release notes.
The finalization process encompasses multiple distinct operations:
- Nexus promotion: The staging repository that passed the vote is promoted (released) to the Apache release repository, which synchronizes to Maven Central.
- PyPI publishing: PySpark packages are uploaded to PyPI via
twine, making them available topip install pyspark. - SVN migration: Binary artifacts are moved from the
devSVN area to thereleaseSVN area. - Documentation deployment: The generated documentation site is deployed to the official Apache Spark documentation URL.
- RC cleanup: Previous release candidate artifacts are removed from staging areas to avoid confusion.
- Contributor attribution: A Python script generates a list of all contributors between the previous release and the current release by analyzing git commit history and mapping authors to GitHub usernames via the GitHub API.
- LLM documentation index: A machine-readable
llms.txtfile is generated to provide an LLM-friendly documentation index.
Usage
Use after the RC vote passes to make the release official. This is the final step in the release process and requires PMC (Project Management Committee) confirmation.
Theoretical Basis
The finalization follows a promotion pipeline model:
promote(nexus_staging -> release) -> publish(pypi) -> deploy(docs) -> cleanup(old_rcs) -> generate(contributors) -> generate(llms_txt)
The key properties of the finalization process are:
- Irreversibility: Unlike staging, promotion to release repositories is effectively permanent. This is why the vote-based verification step must precede finalization.
- Multi-channel distribution: Artifacts are distributed through multiple channels (Maven Central, PyPI, Apache mirrors, documentation site) to reach all user communities.
- Attribution: The contributor generation process ensures that all contributors receive proper credit in release notes, supporting open source community health.
- Automation: Despite the complexity of multi-channel distribution, the process is automated through scripts, reducing the chance of human error during the critical finalization step.
- Cleanup: Removing old release candidates prevents confusion and reduces storage costs in staging repositories.
The contributor generation tool bridges the gap between git commit metadata and GitHub identity, providing rich attribution that includes GitHub usernames and pull request numbers. This supports the Apache Software Foundation's emphasis on community recognition and transparent governance.