Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Spark Mvn Clean Package

From Leeroopedia


Property Value
source Repo: Apache Spark
source Doc: Spark Build Documentation
domain Build_Systems
type External Tool Doc

Overview

Maven-based compilation command for building Apache Spark from source with configurable feature profiles.

Description

The build/mvn clean package command compiles all Spark modules according to active Maven profiles. This invokes Maven's clean lifecycle (removing previous build artifacts) followed by the package lifecycle (compile, test-compile, package). The -DskipTests flag is commonly used during development to skip test execution.

The lifecycle phases executed in order are:

  • clean -- Deletes the target/ directory in each module, removing all previously compiled classes and packaged artifacts.
  • compile -- Compiles the main source code (Scala and Java) of each module.
  • test-compile -- Compiles the test source code (but does not run tests when -DskipTests is active).
  • package -- Packages compiled classes into JAR files and produces assembly artifacts.

Usage

Use this when you need to compile Spark from source. Choose profiles based on your deployment target. Always use build/mvn rather than a system Maven to ensure version consistency.

Code Reference

Property Value
Source Repository apache/spark, File build/mvn (lines 153-176) and docs/building-spark.md
Signature build/mvn clean package -DskipTests [-P<profiles>]
Import N/A (shell command)

I/O Contract

Inputs:

  • Spark source tree -- required
  • Maven profiles as -P flags -- optional
  • -DskipTests flag -- optional

Outputs:

  • Compiled JARs in each module's target/ directory
  • Assembly JARs for distribution

Key Profiles

Profile Flag Description
Kubernetes -Pkubernetes Enables Kubernetes cluster manager support
YARN -Pyarn Enables Apache YARN cluster manager support
Hadoop Provided -Phadoop-provided Excludes bundled Hadoop libraries (for environments where Hadoop is pre-installed)
Hive -Phive Enables Apache Hive integration and HiveQL support
SparkR -Psparkr Enables R language bindings and SparkR package
Spark Connect -Pconnect Enables Spark Connect client-server architecture

Usage Examples

Basic build (skip tests):

./build/mvn clean package -DskipTests

Build with Kubernetes and YARN support:

./build/mvn clean package -DskipTests -Pkubernetes -Pyarn

Full build with all major profiles:

./build/mvn clean package -DskipTests -Pkubernetes -Pyarn -Phive -Psparkr -Pconnect

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment