Implementation:Apache Spark Make Distribution
| Field | Value |
|---|---|
| Source Repository | Apache Spark |
| Domains | Build_Systems, Packaging |
| Last Updated | 2026-02-08 14:00 GMT |
Overview
Shell script that creates self-contained binary distributions of Apache Spark with optional PySpark, SparkR, and Spark Connect packages.
Description
The dev/make-distribution.sh script assembles a complete Spark binary distribution. It copies compiled JARs, bin/sbin scripts, configuration templates, and optional language packages into a dist/ directory. It can produce compressed tarballs and build PySpark pip packages, SparkR CRAN packages, and Spark Connect variants.
The script performs the following steps:
- Detects the Spark version from pom.xml
- Optionally builds Spark using Maven or SBT if not already built
- Creates the dist/ directory with the standard Spark layout
- Copies core JARs from assembly/target into dist/jars/
- Copies bin/ and sbin/ scripts
- Copies conf/ templates (renaming .template files)
- Copies Python and R directories if present
- Optionally builds PySpark pip-installable packages (sdist)
- Optionally builds SparkR CRAN packages
- Optionally creates a Spark Connect distribution variant
- Optionally compresses everything into a .tgz tarball
Usage
Use after a successful build to create a deployable distribution. Select --pip for PySpark, --r for SparkR, --tgz for a compressed tarball. The script is commonly used for creating release candidates and custom deployment packages.
Code Reference
| Attribute | Details |
|---|---|
| Source | Repository apache/spark, File dev/make-distribution.sh, Lines 1-338 |
| Signature | dev/make-distribution.sh [--name <name>] [--tgz] [--pip] [--r] [--connect] [--mvn <path>] [--sbt-enabled] [--sbt <path>]
|
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| Compiled Spark source tree | implicit | Yes | Spark must be built before running this script |
--name |
str | No | Distribution name suffix (e.g., hadoop3) |
--tgz |
flag | No | Create a compressed tarball archive |
--pip |
flag | No | Build PySpark pip-installable packages |
--r |
flag | No | Build SparkR CRAN packages |
--connect |
flag | No | Build Spark Connect distribution variant |
--mvn |
path | No | Path to custom Maven executable |
--sbt-enabled |
flag | No | Use SBT instead of Maven for building |
--sbt |
path | No | Path to custom SBT executable |
Outputs
| Output | Description |
|---|---|
| dist/ directory | Complete Spark distribution with jars/, conf/, bin/, sbin/, python/ subdirectories |
| spark-<version>-bin-<name>.tgz | Compressed tarball (when --tgz is specified) |
| PySpark sdist packages | Pip-installable packages in python/dist/ (when --pip is specified) |
| SparkR CRAN package | R package archive (when --r is specified) |
Usage Examples
Basic distribution with tarball:
./dev/make-distribution.sh --name custom --tgz
Distribution with PySpark pip packages:
./dev/make-distribution.sh --name hadoop3 --tgz --pip
Full distribution with all optional packages:
./dev/make-distribution.sh --name hadoop3 --tgz --pip --r --connect
Using a custom Maven path:
./dev/make-distribution.sh --name custom --mvn /opt/maven/bin/mvn --tgz