Principle:Heibaiying BigData Notes Storm Topology Deployment
Overview
| Property | Value |
|---|---|
| Concept | Storm Topology Deployment |
| Category | Stream Processing / Deployment and Operations |
| Applies To | Apache Storm Topologies |
| Prerequisites | Understanding of topology construction with TopologyBuilder, and Storm cluster architecture |
Description
Once a Storm topology has been constructed using TopologyBuilder, it must be deployed (submitted) to a cluster for execution. Storm provides two deployment modes that serve different stages of the development lifecycle:
- Local mode (via
LocalCluster) -- Runs the entire topology within a single JVM process on the developer's machine. This mode simulates a Storm cluster in-process, making it ideal for rapid development, debugging, and unit testing. No external Storm installation is required.
- Remote/Cluster mode (via
StormSubmitter) -- Submits the topology to a production Storm cluster managed by the Nimbus daemon. The topology is distributed across worker nodes, with Nimbus coordinating task assignment and Zookeeper managing cluster state.
The two modes use different submission APIs but share the same topology construction code. This means a topology can be developed and tested locally, then deployed to production with minimal code changes (typically just switching the submission call).
Usage
Local Mode
Local mode is used during development and testing:
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("TopologyName", new Config(), topology);
Key characteristics:
- Runs entirely within the current JVM process.
- No need to install or start Storm daemons.
- Spouts and Bolts execute in separate threads within the same process.
- Useful for debugging with IDE breakpoints and logging.
- Can be shut down programmatically with
cluster.shutdown().
Remote/Cluster Mode
Remote mode is used for production deployment:
StormSubmitter.submitTopology("TopologyName", config, topology);
Key characteristics:
- Requires a running Storm cluster (Nimbus, Supervisor, Zookeeper).
- The topology JAR is uploaded to Nimbus, which distributes it to Supervisors.
- Workers are spawned as separate JVM processes on cluster nodes.
- The topology runs indefinitely until explicitly killed.
- Submitted via command line:
storm jar topology.jar com.example.MainClass.
Theoretical Basis
Storm Cluster Architecture
Understanding deployment requires knowledge of Storm's cluster architecture:
| Component | Role |
|---|---|
| Nimbus | The master daemon. Distributes code across the cluster, assigns tasks to Supervisors, and monitors for failures. |
| Supervisor | Runs on each worker node. Manages worker processes according to Nimbus's task assignments. |
| Zookeeper | Coordinates state between Nimbus and Supervisors. Stores topology metadata, task assignments, and heartbeat information. |
| Worker | A JVM process spawned by a Supervisor. Runs the executors and tasks assigned by Nimbus. |
Local vs. Remote: Trade-offs
| Aspect | Local Mode | Remote Mode |
|---|---|---|
| Setup complexity | None (runs in-process) | Requires full cluster deployment |
| Debugging | Full IDE support (breakpoints, profiling) | Requires log analysis; no interactive debugging |
| Performance | Limited by single JVM | Scales horizontally across cluster nodes |
| Fault tolerance | None (single process) | Full (Nimbus reassigns failed tasks) |
| Persistence | Terminates when JVM exits | Runs indefinitely until explicitly killed |
| Resource isolation | Shared heap with application | Separate worker JVMs with configurable resources |
Deployment Workflow
A typical Storm development workflow follows these stages:
- Develop -- Write Spouts, Bolts, and topology wiring code.
- Test locally -- Submit to
LocalClusterfor rapid iteration and debugging. - Package -- Build a fat JAR with all dependencies (excluding storm-core).
- Deploy to staging -- Submit to a staging cluster for integration testing.
- Deploy to production -- Submit to the production cluster via
storm jar. - Monitor -- Use Storm UI to observe throughput, latency, and component capacity.
- Scale -- Use
storm rebalanceto adjust parallelism at runtime if needed.
Submission Error Handling
When submitting to a remote cluster, several exceptions may occur:
- AlreadyAliveException -- A topology with the same name is already running. Either kill the existing topology or use a different name.
- InvalidTopologyException -- The topology DAG is malformed (e.g., a Bolt references a non-existent source component).
- AuthorizationException -- The submitting user does not have permission to submit topologies (in a secured cluster).
Related Pages
| Relationship | Page |
|---|---|
| implemented_by | Heibaiying_BigData_Notes_Storm_Topology_Submission |
| related | Heibaiying_BigData_Notes_Storm_Topology_Wiring |
| related | Heibaiying_BigData_Notes_Storm_Application_Packaging |
| related | Heibaiying_BigData_Notes_Storm_Parallelism_Configuration |