Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Apache Spark Release Version Tagging

From Leeroopedia


Domains Release_Engineering, Version_Control
Last Updated 2026-02-08 12:00 GMT

Overview

A version control tagging strategy that creates immutable release points in the repository while simultaneously bumping version numbers for continued development.

Description

Release tagging creates an immutable snapshot of the codebase at a specific version. The process involves updating all version references (POMs, Python, R, docs) to the release version, creating a signed git tag, then updating to the next SNAPSHOT version for development. Release candidates use the -rcN suffix convention. This two-phase commit (tag + bump) ensures the release is precisely reproducible while development continues.

The Apache Spark tagging process is particularly thorough because version numbers are embedded in multiple build systems and documentation sources:

  • Maven POMs: All pom.xml files across the multi-module project must be updated consistently.
  • Python: The setup.py and version files for PySpark must reflect the release version.
  • R: The DESCRIPTION file for SparkR must be updated.
  • Documentation: Version references in documentation configuration must match the release version.

The tagging operation is atomic in the sense that either all version references are updated and the tag is created, or the process fails cleanly. After the tag is created, version numbers are immediately bumped to the next SNAPSHOT version, signaling that the branch is open for development again.

Usage

Use as the first step in the release process, after all features are merged and the branch is stabilized. This step must be completed before building any release artifacts, as the build process checks out the tagged commit.

Theoretical Basis

The tagging process follows an atomic version transition model:

set_version(release) -> commit -> tag(rc_tag) -> set_version(next_snapshot) -> commit -> push

This two-phase approach provides several guarantees:

  1. Immutability: The signed git tag (e.g., v3.5.0-rc1) permanently marks the exact source state used to produce release artifacts.
  2. Continuity: By immediately bumping to the next SNAPSHOT version, the branch remains usable for development without version conflicts.
  3. Traceability: The release candidate numbering (-rc1, -rc2, etc.) provides a clear audit trail of release attempts.
  4. Authentication: Tags are GPG-signed, ensuring that the tag was created by an authorized release manager.

The convention of using release candidates (-rcN) before final releases supports the Apache voting process, where community members verify release candidates before approving the final release. If an RC fails the vote, a new RC is created with an incremented number, preserving the history of all release attempts.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment