Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Spark Make Distribution

From Leeroopedia


Field Value
Source Repository Apache Spark
Domains Build_Systems, Packaging
Last Updated 2026-02-08 14:00 GMT

Overview

Shell script that creates self-contained binary distributions of Apache Spark with optional PySpark, SparkR, and Spark Connect packages.

Description

The dev/make-distribution.sh script assembles a complete Spark binary distribution. It copies compiled JARs, bin/sbin scripts, configuration templates, and optional language packages into a dist/ directory. It can produce compressed tarballs and build PySpark pip packages, SparkR CRAN packages, and Spark Connect variants.

The script performs the following steps:

  • Detects the Spark version from pom.xml
  • Optionally builds Spark using Maven or SBT if not already built
  • Creates the dist/ directory with the standard Spark layout
  • Copies core JARs from assembly/target into dist/jars/
  • Copies bin/ and sbin/ scripts
  • Copies conf/ templates (renaming .template files)
  • Copies Python and R directories if present
  • Optionally builds PySpark pip-installable packages (sdist)
  • Optionally builds SparkR CRAN packages
  • Optionally creates a Spark Connect distribution variant
  • Optionally compresses everything into a .tgz tarball

Usage

Use after a successful build to create a deployable distribution. Select --pip for PySpark, --r for SparkR, --tgz for a compressed tarball. The script is commonly used for creating release candidates and custom deployment packages.

Code Reference

Attribute Details
Source Repository apache/spark, File dev/make-distribution.sh, Lines 1-338
Signature dev/make-distribution.sh [--name <name>] [--tgz] [--pip] [--r] [--connect] [--mvn <path>] [--sbt-enabled] [--sbt <path>]

I/O Contract

Inputs

Parameter Type Required Description
Compiled Spark source tree implicit Yes Spark must be built before running this script
--name str No Distribution name suffix (e.g., hadoop3)
--tgz flag No Create a compressed tarball archive
--pip flag No Build PySpark pip-installable packages
--r flag No Build SparkR CRAN packages
--connect flag No Build Spark Connect distribution variant
--mvn path No Path to custom Maven executable
--sbt-enabled flag No Use SBT instead of Maven for building
--sbt path No Path to custom SBT executable

Outputs

Output Description
dist/ directory Complete Spark distribution with jars/, conf/, bin/, sbin/, python/ subdirectories
spark-<version>-bin-<name>.tgz Compressed tarball (when --tgz is specified)
PySpark sdist packages Pip-installable packages in python/dist/ (when --pip is specified)
SparkR CRAN package R package archive (when --r is specified)

Usage Examples

Basic distribution with tarball:

./dev/make-distribution.sh --name custom --tgz

Distribution with PySpark pip packages:

./dev/make-distribution.sh --name hadoop3 --tgz --pip

Full distribution with all optional packages:

./dev/make-distribution.sh --name hadoop3 --tgz --pip --r --connect

Using a custom Maven path:

./dev/make-distribution.sh --name custom --mvn /opt/maven/bin/mvn --tgz

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment