Implementation:Apache Spark Merge Spark PR
| Knowledge Sources | |
|---|---|
| Domains | DevOps, Version_Control |
| Last Updated | 2026-02-08 22:00 GMT |
Overview
Interactive command-line tool for Apache Spark committers to merge GitHub pull requests with standardized commit messages and JIRA integration.
Description
merge_spark_pr.py is a Python CLI tool used by Apache Spark committers to manage the PR merge workflow. It fetches PR metadata from the GitHub API, normalizes the PR title to the `[SPARK-XXXXX][MODULE]` format via regex parsing, performs a squash merge by fetching the PR branch and target branch, constructs a proper merge commit message with authorship attribution, and supports cherry-picking to maintenance branches. It also integrates with ASF JIRA to automatically resolve issues, assign contributors, and set fix versions.
Usage
Use this tool when you are an Apache Spark committer and need to merge an approved GitHub pull request into the main branch (or cherry-pick to maintenance branches). It ensures consistent commit formatting, proper author attribution, and JIRA issue tracking.
Code Reference
Source Location
- Repository: Apache_Spark
- File: dev/merge_spark_pr.py
- Lines: 1-745
Signature
def merge_pr(pr_num, target_ref, title, body, pr_repo_desc):
"""
Merge the requested PR via squash merge and return the merge hash.
Args:
pr_num: The GitHub pull request number.
target_ref: Target branch name (e.g., 'master').
title: Normalized PR title in [SPARK-XXXXX][MODULE] format.
body: PR body text (@ symbols stripped to avoid email triggers).
pr_repo_desc: Description of the PR source repo/branch.
"""
def cherry_pick(pr_num, merge_hash, default_branch):
"""
Cherry-pick a merged PR commit to one or more maintenance branches.
"""
def main():
"""
Interactive entry point: prompts for PR number, fetches metadata,
normalizes title, performs merge, and optionally cherry-picks.
"""
Import
# Standalone CLI script - invoked directly
python dev/merge_spark_pr.py
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| PR Number | int (interactive) | Yes | GitHub pull request number to merge |
| SPARK_HOME | env var | No | Path to local Spark git repo (default: cwd) |
| PR_REMOTE_NAME | env var | No | Git remote name for GitHub mirror (default: apache-github) |
| PUSH_REMOTE_NAME | env var | No | Git remote name for Apache git (default: apache) |
| GITHUB_OAUTH_KEY | env var | No | GitHub OAuth token for API rate limits |
| JIRA_ACCESS_TOKEN | env var | No | ASF JIRA access token for issue management |
Outputs
| Name | Type | Description |
|---|---|---|
| Merge commit | git commit | Squash merge commit on the target branch |
| Cherry-pick commits | git commits | Optional cherry-pick commits on maintenance branches |
| JIRA updates | API side-effects | Issue resolution, fix version assignment, contributor role grants |
Usage Examples
Basic PR Merge
# Set up environment
export SPARK_HOME=/path/to/spark
export PR_REMOTE_NAME=apache-github
export PUSH_REMOTE_NAME=apache
export GITHUB_OAUTH_KEY=ghp_xxxxxxxxxxxx
# Run the merge tool
cd $SPARK_HOME
python dev/merge_spark_pr.py
# Interactive prompts:
# Which pull request would you like to merge? (e.g., 12345):
# Enter primary author in the format of "name <email>" [...]:
# Would you like to pick 1234abc into another branch? (y/N):
With JIRA Integration
# Also set JIRA credentials for automatic issue management
export JIRA_ACCESS_TOKEN=your_jira_token
python dev/merge_spark_pr.py
# The tool will automatically:
# - Resolve the linked SPARK-XXXXX JIRA issue
# - Set the fix version based on target branch
# - Assign the issue if unassigned
# - Grant contributor role if needed