Heuristic:Heibaiying BigData Notes Maven Shade Packaging Tip
| Knowledge Sources | |
|---|---|
| Domains | Build_System, Deployment |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
Use maven-shade-plugin instead of maven-assembly-plugin for packaging big data applications, and always exclude cluster-provided framework JARs to avoid classpath conflicts.
Description
When packaging big data applications (Storm, Flink, Spark) for cluster deployment, the choice of Maven packaging plugin and dependency exclusion strategy is critical. The `maven-shade-plugin` produces a single uber-JAR with all dependencies relocated to avoid conflicts, while `maven-assembly-plugin` can cause `RuntimeException` errors particularly with HDFS integration. Additionally, framework JARs (storm-core, flink-java, spark-streaming) must be excluded from the uber-JAR because they are already present in the cluster installation directories.
Usage
Use this heuristic whenever packaging a big data application for cluster deployment. Apply when:
- Building Storm topologies for remote cluster submission
- Packaging Flink jobs for submission via `flink run`
- Creating Spark application JARs for `spark-submit`
- Encountering `Found multiple defaults.yaml resources` or similar classpath conflicts
The Insight (Rule of Thumb)
- Action: Use `maven-shade-plugin` (version 3.0.0) instead of `maven-assembly-plugin` for creating uber-JARs.
- Value: Set framework dependencies to `<scope>provided</scope>` in pom.xml to exclude them from the uber-JAR.
- Trade-off: Slightly more complex pom.xml configuration, but prevents all classpath conflict errors at runtime.
- Critical: For Storm-HDFS integration, using `maven-assembly-plugin` will throw `RuntimeException`. Use `maven-shade-plugin` unconditionally.
Exclusion checklist:
- Storm: Exclude `storm-core` (in Storm's `lib/` directory)
- Flink: Exclude `flink-java`, `flink-streaming-java` (in Flink's `lib/` directory)
- Spark: Exclude `spark-streaming`, `spark-core` (in Spark's `jars/` directory)
Reasoning
Storm, Flink, and Spark clusters ship with their own framework JARs in installation directories. When these JARs are also included in the application uber-JAR, the classloader encounters duplicate class definitions. Storm specifically checks for `defaults.yaml` on the classpath and fails with "Found multiple defaults.yaml resources" if storm-core is bundled. The `maven-shade-plugin` handles class relocation and resource merging correctly, while `maven-assembly-plugin` performs a naive merge that breaks service provider interfaces and resource files required by HDFS.