Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Apache Spark Merge Spark PR

From Leeroopedia


Knowledge Sources
Domains DevOps, Version_Control
Last Updated 2026-02-08 22:00 GMT

Overview

Interactive command-line tool for Apache Spark committers to merge GitHub pull requests with standardized commit messages and JIRA integration.

Description

merge_spark_pr.py is a Python CLI tool used by Apache Spark committers to manage the PR merge workflow. It fetches PR metadata from the GitHub API, normalizes the PR title to the `[SPARK-XXXXX][MODULE]` format via regex parsing, performs a squash merge by fetching the PR branch and target branch, constructs a proper merge commit message with authorship attribution, and supports cherry-picking to maintenance branches. It also integrates with ASF JIRA to automatically resolve issues, assign contributors, and set fix versions.

Usage

Use this tool when you are an Apache Spark committer and need to merge an approved GitHub pull request into the main branch (or cherry-pick to maintenance branches). It ensures consistent commit formatting, proper author attribution, and JIRA issue tracking.

Code Reference

Source Location

Signature

def merge_pr(pr_num, target_ref, title, body, pr_repo_desc):
    """
    Merge the requested PR via squash merge and return the merge hash.

    Args:
        pr_num: The GitHub pull request number.
        target_ref: Target branch name (e.g., 'master').
        title: Normalized PR title in [SPARK-XXXXX][MODULE] format.
        body: PR body text (@ symbols stripped to avoid email triggers).
        pr_repo_desc: Description of the PR source repo/branch.
    """

def cherry_pick(pr_num, merge_hash, default_branch):
    """
    Cherry-pick a merged PR commit to one or more maintenance branches.
    """

def main():
    """
    Interactive entry point: prompts for PR number, fetches metadata,
    normalizes title, performs merge, and optionally cherry-picks.
    """

Import

# Standalone CLI script - invoked directly
python dev/merge_spark_pr.py

I/O Contract

Inputs

Name Type Required Description
PR Number int (interactive) Yes GitHub pull request number to merge
SPARK_HOME env var No Path to local Spark git repo (default: cwd)
PR_REMOTE_NAME env var No Git remote name for GitHub mirror (default: apache-github)
PUSH_REMOTE_NAME env var No Git remote name for Apache git (default: apache)
GITHUB_OAUTH_KEY env var No GitHub OAuth token for API rate limits
JIRA_ACCESS_TOKEN env var No ASF JIRA access token for issue management

Outputs

Name Type Description
Merge commit git commit Squash merge commit on the target branch
Cherry-pick commits git commits Optional cherry-pick commits on maintenance branches
JIRA updates API side-effects Issue resolution, fix version assignment, contributor role grants

Usage Examples

Basic PR Merge

# Set up environment
export SPARK_HOME=/path/to/spark
export PR_REMOTE_NAME=apache-github
export PUSH_REMOTE_NAME=apache
export GITHUB_OAUTH_KEY=ghp_xxxxxxxxxxxx

# Run the merge tool
cd $SPARK_HOME
python dev/merge_spark_pr.py

# Interactive prompts:
#   Which pull request would you like to merge? (e.g., 12345):
#   Enter primary author in the format of "name <email>" [...]:
#   Would you like to pick 1234abc into another branch? (y/N):

With JIRA Integration

# Also set JIRA credentials for automatic issue management
export JIRA_ACCESS_TOKEN=your_jira_token

python dev/merge_spark_pr.py
# The tool will automatically:
# - Resolve the linked SPARK-XXXXX JIRA issue
# - Set the fix version based on target branch
# - Assign the issue if unassigned
# - Grant contributor role if needed

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment