Principle:Apache Spark Release Version Tagging
| Domains | Release_Engineering, Version_Control |
|---|---|
| Last Updated | 2026-02-08 12:00 GMT |
Overview
A version control tagging strategy that creates immutable release points in the repository while simultaneously bumping version numbers for continued development.
Description
Release tagging creates an immutable snapshot of the codebase at a specific version. The process involves updating all version references (POMs, Python, R, docs) to the release version, creating a signed git tag, then updating to the next SNAPSHOT version for development. Release candidates use the -rcN suffix convention. This two-phase commit (tag + bump) ensures the release is precisely reproducible while development continues.
The Apache Spark tagging process is particularly thorough because version numbers are embedded in multiple build systems and documentation sources:
- Maven POMs: All
pom.xmlfiles across the multi-module project must be updated consistently. - Python: The
setup.pyand version files for PySpark must reflect the release version. - R: The
DESCRIPTIONfile for SparkR must be updated. - Documentation: Version references in documentation configuration must match the release version.
The tagging operation is atomic in the sense that either all version references are updated and the tag is created, or the process fails cleanly. After the tag is created, version numbers are immediately bumped to the next SNAPSHOT version, signaling that the branch is open for development again.
Usage
Use as the first step in the release process, after all features are merged and the branch is stabilized. This step must be completed before building any release artifacts, as the build process checks out the tagged commit.
Theoretical Basis
The tagging process follows an atomic version transition model:
set_version(release) -> commit -> tag(rc_tag) -> set_version(next_snapshot) -> commit -> push
This two-phase approach provides several guarantees:
- Immutability: The signed git tag (e.g.,
v3.5.0-rc1) permanently marks the exact source state used to produce release artifacts. - Continuity: By immediately bumping to the next SNAPSHOT version, the branch remains usable for development without version conflicts.
- Traceability: The release candidate numbering (
-rc1,-rc2, etc.) provides a clear audit trail of release attempts. - Authentication: Tags are GPG-signed, ensuring that the tag was created by an authorized release manager.
The convention of using release candidates (-rcN) before final releases supports the Apache voting process, where community members verify release candidates before approving the final release. If an RC fails the vote, a new RC is created with an incremented number, preserving the history of all release attempts.