Implementation:Apache Spark Mvn Clean Package
| Property | Value |
|---|---|
| source | Repo: Apache Spark |
| source | Doc: Spark Build Documentation |
| domain | Build_Systems |
| type | External Tool Doc |
Overview
Maven-based compilation command for building Apache Spark from source with configurable feature profiles.
Description
The build/mvn clean package command compiles all Spark modules according to active Maven profiles. This invokes Maven's clean lifecycle (removing previous build artifacts) followed by the package lifecycle (compile, test-compile, package). The -DskipTests flag is commonly used during development to skip test execution.
The lifecycle phases executed in order are:
- clean -- Deletes the
target/directory in each module, removing all previously compiled classes and packaged artifacts. - compile -- Compiles the main source code (Scala and Java) of each module.
- test-compile -- Compiles the test source code (but does not run tests when
-DskipTestsis active). - package -- Packages compiled classes into JAR files and produces assembly artifacts.
Usage
Use this when you need to compile Spark from source. Choose profiles based on your deployment target. Always use build/mvn rather than a system Maven to ensure version consistency.
Code Reference
| Property | Value |
|---|---|
| Source | Repository apache/spark, File build/mvn (lines 153-176) and docs/building-spark.md
|
| Signature | build/mvn clean package -DskipTests [-P<profiles>]
|
| Import | N/A (shell command) |
I/O Contract
Inputs:
- Spark source tree -- required
- Maven profiles as
-Pflags -- optional -DskipTestsflag -- optional
Outputs:
- Compiled JARs in each module's
target/directory - Assembly JARs for distribution
Key Profiles
| Profile | Flag | Description |
|---|---|---|
| Kubernetes | -Pkubernetes |
Enables Kubernetes cluster manager support |
| YARN | -Pyarn |
Enables Apache YARN cluster manager support |
| Hadoop Provided | -Phadoop-provided |
Excludes bundled Hadoop libraries (for environments where Hadoop is pre-installed) |
| Hive | -Phive |
Enables Apache Hive integration and HiveQL support |
| SparkR | -Psparkr |
Enables R language bindings and SparkR package |
| Spark Connect | -Pconnect |
Enables Spark Connect client-server architecture |
Usage Examples
Basic build (skip tests):
./build/mvn clean package -DskipTests
Build with Kubernetes and YARN support:
./build/mvn clean package -DskipTests -Pkubernetes -Pyarn
Full build with all major profiles:
./build/mvn clean package -DskipTests -Pkubernetes -Pyarn -Phive -Psparkr -Pconnect