Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Heibaiying BigData Notes Storm Application Packaging

From Leeroopedia


Overview

Property Value
Concept Storm Application Packaging
Category Stream Processing / Build and Deployment
Applies To Apache Storm Topologies built with Maven
Prerequisites Understanding of Storm topology deployment, Maven build system basics

Description

Before a Storm topology can be submitted to a production cluster, it must be packaged as a JAR file containing the application code and all of its runtime dependencies. This is commonly referred to as a fat JAR or uber JAR. The packaged JAR is then submitted to the cluster using the storm jar command, which uploads it to the Nimbus daemon for distribution to worker nodes.

Proper packaging is a critical step in the Storm deployment workflow. Incorrect packaging can lead to runtime errors such as:

  • ClassNotFoundException -- A required dependency was not included in the JAR.
  • Found multiple defaults.yaml resources -- The storm-core JAR was included in the fat JAR, conflicting with the one provided by the cluster environment.
  • No FileSystem for scheme: hdfs -- Service provider configuration files were overwritten during packaging (a known issue with maven-assembly-plugin).

Usage

There are three primary Maven-based packaging approaches for Storm applications, each with different trade-offs:

Approach 1: Plain mvn package

The simplest approach, but it does not include dependencies. Suitable only for projects with no third-party libraries.

mvn package

When using this approach with external dependencies, they must be specified at submission time:

storm jar topology.jar com.example.MainClass \
  --jars "./lib/dependency1.jar,./lib/dependency2.jar"

Approach 2: maven-assembly-plugin

Creates a fat JAR with all dependencies bundled. This is the approach recommended in Storm's official documentation for simple projects.

mvn assembly:assembly

Produces a JAR with suffix -jar-with-dependencies.

Approach 3: maven-shade-plugin (Recommended)

Creates a fat JAR with intelligent resource merging. This is the recommended approach for production use, particularly when integrating with Hadoop ecosystem components (HDFS, HBase, etc.).

mvn package

Produces a shaded JAR alongside an original- prefixed JAR (the unshaded version).

Theoretical Basis

Why Fat JARs Are Necessary

Storm's cluster architecture requires that all application code and dependencies be available on every worker node where the topology's tasks execute. Since worker nodes may not have access to the developer's local Maven repository or the internet, the simplest and most reliable approach is to bundle everything into a single self-contained JAR file.

The storm-core Exclusion Rule

A critical packaging rule is that storm-core must be excluded from the fat JAR. The Storm cluster already provides storm-core in its classpath (located in the lib/ directory of the Storm installation). Including it in the application JAR causes a conflict because both JARs contain defaults.yaml, leading to the "Found multiple defaults.yaml resources" RuntimeException.

There are two ways to exclude storm-core:

  • Set its Maven scope to provided -- This works but prevents local testing since the dependency is not available at compile time.
  • Exclude it in the packaging plugin configuration -- This is the recommended approach because it allows storm-core to remain available during local development while being excluded from the final JAR.

Assembly vs. Shade: Key Differences

Feature maven-assembly-plugin maven-shade-plugin
Resource handling Overwrites duplicate resource files Merges duplicate resource files using configurable transformers
Service provider files Overwrites META-INF/services files Merges via ServicesResourceTransformer
Manifest handling Basic manifest configuration Advanced manifest transformation via ManifestResourceTransformer
Signature files May include conflicting signatures Can exclude META-INF signature files (*.SF, *.DSA, *.RSA)
HDFS compatibility May cause "No FileSystem for scheme" errors Handles service provider merging correctly
Recommendation Suitable for simple topologies Recommended for all production use

The maven-shade-plugin's ServicesResourceTransformer is particularly important because Java's ServiceLoader mechanism relies on META-INF/services files to discover implementations. When multiple JARs provide service implementations (common with Hadoop ecosystem libraries), the assembly plugin overwrites these files, causing implementations to be lost. The shade plugin merges them, preserving all service registrations.

Deployment Command

After packaging, the topology is deployed using:

storm jar /path/to/topology-fat.jar com.example.TopologyMainClass [args...]

The storm jar command:

  1. Adds the specified JAR to the classpath.
  2. Invokes the specified main class.
  3. The main class calls StormSubmitter.submitTopology().
  4. The JAR is uploaded to Nimbus for distribution.

Related Pages

Relationship Page
implemented_by Heibaiying_BigData_Notes_Maven_Packaging_for_Storm
related Heibaiying_BigData_Notes_Storm_Topology_Deployment
related Heibaiying_BigData_Notes_Storm_Parallelism_Configuration

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment