Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Heibaiying BigData Notes Storm Topology Deployment

From Leeroopedia


Overview

Property Value
Concept Storm Topology Deployment
Category Stream Processing / Deployment and Operations
Applies To Apache Storm Topologies
Prerequisites Understanding of topology construction with TopologyBuilder, and Storm cluster architecture

Description

Once a Storm topology has been constructed using TopologyBuilder, it must be deployed (submitted) to a cluster for execution. Storm provides two deployment modes that serve different stages of the development lifecycle:

  • Local mode (via LocalCluster) -- Runs the entire topology within a single JVM process on the developer's machine. This mode simulates a Storm cluster in-process, making it ideal for rapid development, debugging, and unit testing. No external Storm installation is required.
  • Remote/Cluster mode (via StormSubmitter) -- Submits the topology to a production Storm cluster managed by the Nimbus daemon. The topology is distributed across worker nodes, with Nimbus coordinating task assignment and Zookeeper managing cluster state.

The two modes use different submission APIs but share the same topology construction code. This means a topology can be developed and tested locally, then deployed to production with minimal code changes (typically just switching the submission call).

Usage

Local Mode

Local mode is used during development and testing:

LocalCluster cluster = new LocalCluster();
cluster.submitTopology("TopologyName", new Config(), topology);

Key characteristics:

  • Runs entirely within the current JVM process.
  • No need to install or start Storm daemons.
  • Spouts and Bolts execute in separate threads within the same process.
  • Useful for debugging with IDE breakpoints and logging.
  • Can be shut down programmatically with cluster.shutdown().

Remote/Cluster Mode

Remote mode is used for production deployment:

StormSubmitter.submitTopology("TopologyName", config, topology);

Key characteristics:

  • Requires a running Storm cluster (Nimbus, Supervisor, Zookeeper).
  • The topology JAR is uploaded to Nimbus, which distributes it to Supervisors.
  • Workers are spawned as separate JVM processes on cluster nodes.
  • The topology runs indefinitely until explicitly killed.
  • Submitted via command line: storm jar topology.jar com.example.MainClass.

Theoretical Basis

Storm Cluster Architecture

Understanding deployment requires knowledge of Storm's cluster architecture:

Component Role
Nimbus The master daemon. Distributes code across the cluster, assigns tasks to Supervisors, and monitors for failures.
Supervisor Runs on each worker node. Manages worker processes according to Nimbus's task assignments.
Zookeeper Coordinates state between Nimbus and Supervisors. Stores topology metadata, task assignments, and heartbeat information.
Worker A JVM process spawned by a Supervisor. Runs the executors and tasks assigned by Nimbus.

Local vs. Remote: Trade-offs

Aspect Local Mode Remote Mode
Setup complexity None (runs in-process) Requires full cluster deployment
Debugging Full IDE support (breakpoints, profiling) Requires log analysis; no interactive debugging
Performance Limited by single JVM Scales horizontally across cluster nodes
Fault tolerance None (single process) Full (Nimbus reassigns failed tasks)
Persistence Terminates when JVM exits Runs indefinitely until explicitly killed
Resource isolation Shared heap with application Separate worker JVMs with configurable resources

Deployment Workflow

A typical Storm development workflow follows these stages:

  1. Develop -- Write Spouts, Bolts, and topology wiring code.
  2. Test locally -- Submit to LocalCluster for rapid iteration and debugging.
  3. Package -- Build a fat JAR with all dependencies (excluding storm-core).
  4. Deploy to staging -- Submit to a staging cluster for integration testing.
  5. Deploy to production -- Submit to the production cluster via storm jar.
  6. Monitor -- Use Storm UI to observe throughput, latency, and component capacity.
  7. Scale -- Use storm rebalance to adjust parallelism at runtime if needed.

Submission Error Handling

When submitting to a remote cluster, several exceptions may occur:

  • AlreadyAliveException -- A topology with the same name is already running. Either kill the existing topology or use a different name.
  • InvalidTopologyException -- The topology DAG is malformed (e.g., a Bolt references a non-existent source component).
  • AuthorizationException -- The submitting user does not have permission to submit topologies (in a secured cluster).

Related Pages

Relationship Page
implemented_by Heibaiying_BigData_Notes_Storm_Topology_Submission
related Heibaiying_BigData_Notes_Storm_Topology_Wiring
related Heibaiying_BigData_Notes_Storm_Application_Packaging
related Heibaiying_BigData_Notes_Storm_Parallelism_Configuration

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment