Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Spark Release Build Finalize

From Leeroopedia


Knowledge Sources
Domains Release_Engineering
Type API Doc
Last Updated 2026-02-08 12:00 GMT

Overview

Release build script sub-command and Python utilities that finalize an Apache Spark release after a successful vote.

Description

release-build.sh finalize promotes the Nexus staging repository, uploads PySpark to PyPI via twine, moves artifacts from dev to release SVN, and deploys documentation. generate-contributors.py creates contributor lists by diffing commits between release tags and mapping authors to GitHub usernames. generate-llms-txt.py creates an LLM-friendly documentation index.

The finalization process consists of the following operations:

  1. Nexus promotion: The staging repository that received a passing vote is promoted (released) via the Nexus REST API, which triggers synchronization to Maven Central.
  2. PyPI upload: PySpark packages are uploaded to PyPI using twine, making them available via pip install pyspark.
  3. SVN migration: Binary distribution artifacts are moved from the Apache dev SVN area to the release SVN area, making them available on Apache mirrors.
  4. Documentation deployment: The generated documentation site is deployed to the official URL.
  5. Final git tag: The release candidate tag is re-tagged as the final release tag (e.g., v3.5.0-rc1 becomes v3.5.0).
  6. RC cleanup: Previous release candidate directories are removed from SVN staging.

Contributor Generation

The generate-contributors.py script produces attribution lists by:

  • Extracting all commits between PREVIOUS_RELEASE_TAG and RELEASE_TAG using git log.
  • Parsing commit metadata to extract author names and pull request numbers.
  • Using the GitHub API (via optional GITHUB_OAUTH_KEY) to map git author names to GitHub usernames.
  • Producing a formatted contributor list suitable for release notes.

The releaseutils.py module provides the underlying data model and API interaction:

Component Description
Commit class Data class with fields: hash, github_username, title, pr_number
get_commits(tag) Extracts commits between two tags from git history
get_github_name(author) Maps a git author name to a GitHub username via the GitHub API

Usage

Run after the RC vote passes, with PMC confirmation. This is the final step that makes the release publicly available.

Code Reference

Source Location

  • Repository: apache/spark
  • Files:
    • dev/create-release/release-build.sh (lines 101-574)
    • dev/create-release/generate-contributors.py (lines 1-190)
    • dev/create-release/generate-llms-txt.py (lines 1-199)
    • dev/create-release/releaseutils.py (lines 1-138)

Signature

# Finalize the release
dev/create-release/release-build.sh finalize

# Generate contributor list
python3 dev/create-release/generate-contributors.py

# Generate LLM documentation index
python3 dev/create-release/generate-llms-txt.py

generate-contributors.py Environment Variables

Variable Required Description
RELEASE_TAG Yes Tag for the current release (e.g., v3.5.0)
PREVIOUS_RELEASE_TAG Yes Tag for the previous release (e.g., v3.4.0)
GITHUB_OAUTH_KEY No GitHub OAuth token for API rate limit increase

I/O Contract

Inputs

Name Type Required Description
Passed RC vote community consensus Yes PMC confirmation that the RC vote passed
Built artifacts release artifacts Yes All artifacts from the package and publish steps
PyPI credentials environment variables Yes Credentials for uploading PySpark to PyPI via twine
GitHub OAuth key environment variable No Optional token for GitHub API access (contributor generation)
Nexus staging repo ID string Yes ID of the staging repository to promote

Outputs

Name Type Description
Final git tag git tag Release tag (e.g., v3.5.0) in the Apache repository
PySpark on PyPI PyPI package PySpark package published and installable via pip install pyspark
Published docs website Documentation site deployed to the official Apache Spark URL
Maven Central artifacts Maven repository JARs, POMs available on Maven Central after Nexus sync
contributors.txt text file Formatted list of contributors between releases
llms.txt text file LLM-friendly documentation index

Usage Examples

Finalize Release

# Finalize the release after successful vote
dev/create-release/release-build.sh finalize

Generate Contributors

# Generate contributor list between two releases
RELEASE_TAG=v3.5.0 \
PREVIOUS_RELEASE_TAG=v3.4.0 \
GITHUB_OAUTH_KEY=ghp_xxxxxxxxxxxx \
python3 dev/create-release/generate-contributors.py

Generate LLM Documentation Index

# Generate llms.txt for the release documentation
python3 dev/create-release/generate-llms-txt.py

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment