Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Heibaiying BigData Notes Maven Packaging for Storm

From Leeroopedia


Overview

Property Value
Type External Tool Doc
Tools maven-shade-plugin, maven-assembly-plugin, maven-jar-plugin + maven-dependency-plugin
Source notes/Storm三种打包方式对比分析.md:L1-318, notes/大数据应用常用打包方式.md:L1-309

Description

This page documents the three Maven-based packaging strategies for building deployable Storm topology JAR files. Each approach packages the application code and its dependencies into a fat JAR (uber JAR) that can be submitted to a Storm cluster via the storm jar command. The recommended approach is maven-shade-plugin due to its superior handling of resource file merging and broad compatibility with Hadoop ecosystem components.

Usage

The packaging plugin is configured in the project's pom.xml file within the <build><plugins> section. After configuration, the appropriate Maven command is run to produce the deployable JAR.

Code Reference

Option 1: maven-shade-plugin (Recommended)

The shade plugin creates a fat JAR with intelligent resource merging. It is recommended by the Storm project for all production use, especially when integrating with HDFS, HBase, or other Hadoop ecosystem components.

POM Configuration

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <configuration>
        <createDependencyReducedPom>true</createDependencyReducedPom>
        <filters>
            <filter>
                <artifact>*:*</artifact>
                <excludes>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.sf</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.dsa</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                    <exclude>META-INF/*.rsa</exclude>
                    <exclude>META-INF/*.EC</exclude>
                    <exclude>META-INF/*.ec</exclude>
                    <exclude>META-INF/MSFTSIG.SF</exclude>
                    <exclude>META-INF/MSFTSIG.RSA</exclude>
                </excludes>
            </filter>
        </filters>
        <artifactSet>
            <excludes>
                <exclude>org.apache.storm:storm-core</exclude>
            </excludes>
        </artifactSet>
    </configuration>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
                <transformers>
                    <transformer
                       implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                    <transformer
                       implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                    </transformer>
                </transformers>
            </configuration>
        </execution>
    </executions>
</plugin>

Key Configuration Elements

Element Purpose
createDependencyReducedPom When true, generates a reduced POM that excludes the shaded dependencies (prevents downstream projects from pulling them in again)
filters / excludes Removes JAR signature files (*.SF, *.DSA, *.RSA, *.EC) from all dependencies to prevent "Invalid signature file digest" errors
artifactSet / excludes Excludes org.apache.storm:storm-core from the fat JAR since it is provided by the cluster environment
ServicesResourceTransformer Merges META-INF/services files from all dependencies instead of overwriting them; critical for Java ServiceLoader compatibility (e.g., HDFS FileSystem implementations)
ManifestResourceTransformer Merges JAR manifest entries from all dependencies

Build Command

mvn package

Produces two JAR files:

  • target/project-version.jar -- The shaded fat JAR (use this for deployment).
  • target/original-project-version.jar -- The original unshaded JAR.

Option 2: maven-assembly-plugin

The assembly plugin bundles all dependencies into a single JAR. It is simpler to configure but has limitations with resource file merging.

POM Configuration

<build>
    <plugins>
        <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
                <descriptors>
                    <descriptor>src/main/resources/assembly.xml</descriptor>
                </descriptors>
                <archive>
                    <manifest>
                        <mainClass>com.heibaiying.wordcount.ClusterWordCountApp</mainClass>
                    </manifest>
                </archive>
            </configuration>
        </plugin>
    </plugins>
</build>

Assembly Descriptor (assembly.xml)

<assembly xmlns="http://maven.apache.org/ASSEMBLY/2.0.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/ASSEMBLY/2.0.0
                              http://maven.apache.org/xsd/assembly-2.0.0.xsd">

    <id>jar-with-dependencies</id>

    <formats>
        <format>jar</format>
    </formats>

    <includeBaseDirectory>false</includeBaseDirectory>
    <dependencySets>
        <dependencySet>
            <outputDirectory>/</outputDirectory>
            <useProjectArtifact>true</useProjectArtifact>
            <unpack>true</unpack>
            <scope>runtime</scope>
            <excludes>
                <exclude>org.apache.storm:storm-core</exclude>
            </excludes>
        </dependencySet>
    </dependencySets>
</assembly>

Key Configuration Elements

Element Purpose
descriptors / descriptor Points to the assembly descriptor XML file that controls packaging behavior
mainClass Specifies the main entry class in the JAR manifest
id Defines the suffix appended to the output JAR name (e.g., -jar-with-dependencies)
unpack When true, unpacks dependency JARs and merges their contents into the fat JAR
excludes Excludes org.apache.storm:storm-core from the packaged output

Build Command

mvn assembly:assembly

Produces two JAR files:

  • target/project-version-jar-with-dependencies.jar -- The fat JAR (use this for deployment).
  • target/project-version.jar -- The standard JAR without dependencies.

Option 3: maven-jar-plugin + maven-dependency-plugin

This approach is used when non-Maven-managed JARs (e.g., from a resources/lib directory) need to be included in the final artifact.

POM Configuration

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-jar-plugin</artifactId>
            <configuration>
                <archive>
                    <manifest>
                        <addClasspath>true</addClasspath>
                        <classpathPrefix>lib/</classpathPrefix>
                        <mainClass>com.heibaiying.BigDataApp</mainClass>
                    </manifest>
                </archive>
            </configuration>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-dependency-plugin</artifactId>
            <executions>
                <execution>
                    <id>copy</id>
                    <phase>compile</phase>
                    <goals>
                        <goal>copy-dependencies</goal>
                    </goals>
                    <configuration>
                        <outputDirectory>
                            ${project.build.directory}/lib
                        </outputDirectory>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

Key Configuration Elements

Element Purpose
addClasspath Adds all dependencies to the manifest's classpath
classpathPrefix Sets the prefix for classpath entries (pointing to the lib/ directory)
mainClass Specifies the main entry class
copy-dependencies Copies all dependency JARs to the target/lib directory

I/O Contract

Plugin Input Output Deploy Command
maven-shade-plugin Project source + all Maven dependencies Single fat JAR (shaded) mvn package
maven-assembly-plugin Project source + all Maven dependencies Single fat JAR (assembled) mvn assembly:assembly
maven-jar-plugin + maven-dependency-plugin Project source + local lib JARs JAR + lib/ directory mvn package

All three approaches produce JARs that are submitted using:

storm jar /path/to/topology.jar com.heibaiying.wordcount.ClusterWordCountApp

Usage Examples

Example 1: Packaging with maven-shade-plugin

# Build the shaded fat JAR
mvn clean package

# Submit to Storm cluster
storm jar target/storm-word-count-1.0.jar com.heibaiying.wordcount.ClusterWordCountApp

Example 2: Packaging with maven-assembly-plugin

# Build the assembled fat JAR
mvn clean assembly:assembly

# Submit the jar-with-dependencies version
storm jar target/storm-word-count-1.0-jar-with-dependencies.jar \
    com.heibaiying.wordcount.ClusterWordCountApp

Example 3: Specifying External Dependencies at Submission

When using plain mvn package without a fat JAR plugin:

storm jar target/storm-word-count-1.0.jar \
    com.heibaiying.wordcount.ClusterWordCountApp \
    --jars "./external/storm-redis-1.1.0.jar,./external/storm-kafka-1.1.0.jar" \
    --artifacts "redis.clients:jedis:2.9.0,org.apache.kafka:kafka_2.10:0.8.2.2^org.slf4j:slf4j-log4j12" \
    --artifactRepositories "jboss-repository^http://repository.jboss.com/maven2"

Example 4: Common Error -- Forgetting to Exclude storm-core

If storm-core is not excluded from the fat JAR, the following error occurs at runtime:

Caused by: java.lang.RuntimeException: java.io.IOException: Found multiple defaults.yaml resources.
You're probably bundling the Storm jars with your topology jar.
[jar:file:/usr/app/apache-storm-1.2.2/lib/storm-core-1.2.2.jar!/defaults.yaml,
jar:file:/usr/appjar/storm-word-count-1.0.jar!/defaults.yaml]

Solution: Add org.apache.storm:storm-core to the plugin's exclude list as shown in the configurations above.

Related Pages

Relationship Page
implements Heibaiying_BigData_Notes_Storm_Application_Packaging
related Heibaiying_BigData_Notes_Storm_Topology_Submission
related Heibaiying_BigData_Notes_Storm_Topology_Deployment

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment