Principle:Heibaiying BigData Notes Storm Parallelism Configuration
Overview
| Property | Value |
|---|---|
| Concept | Storm Parallelism Configuration |
| Category | Stream Processing / Performance Tuning |
| Applies To | Apache Storm Topologies |
| Prerequisites | Understanding of Storm topology wiring, Spouts, and Bolts |
Description
Parallelism in Apache Storm controls how many concurrent processing units are allocated to each component (Spout or Bolt) in a topology. Proper parallelism configuration is essential for achieving high throughput, low latency, and efficient resource utilization.
Storm's parallelism model operates at three distinct levels:
- Workers (JVM processes) -- Each worker is a separate JVM process running on a cluster node. A topology can be distributed across multiple workers, each potentially on a different physical machine.
- Executors (threads) -- Each worker contains one or more executors. An executor is a thread that runs one or more tasks for a specific component. The number of executors is set via the parallelism hint when registering a Spout or Bolt.
- Tasks (logical instances) -- Each executor runs one or more tasks. A task is a logical instance of a Spout or Bolt. By default, each executor runs exactly one task, but this can be overridden with
setNumTasks().
The relationship is: Workers >= 1, each worker contains Executors >= 1, each executor runs Tasks >= 1.
Usage
Parallelism configuration is used when:
- Scaling throughput -- Increasing the number of executors for a bottleneck component allows more tuples to be processed concurrently.
- Distributing load -- Spreading workers across multiple machines distributes the computational and memory burden.
- Balancing the pipeline -- Different components may have different processing costs. Assigning more executors to expensive Bolts prevents them from becoming bottlenecks.
- Enabling dynamic rebalancing -- Setting
numTaskshigher than the executor count allows Storm to rebalance the topology at runtime without restarting it, by redistributing tasks among executors.
The three levels of parallelism are configured through different APIs:
// Level 1: Number of worker processes
Config config = new Config();
config.setNumWorkers(2);
// Level 2: Parallelism hint (number of executors/threads)
builder.setSpout("spout", new MySpout(), 2); // 2 executors
builder.setBolt("bolt", new MyBolt(), 4); // 4 executors
// Level 3: Number of tasks per component
builder.setBolt("bolt", new MyBolt(), 4).setNumTasks(8); // 4 executors, 8 tasks
Theoretical Basis
The Three-Level Hierarchy
Storm's parallelism hierarchy reflects a common pattern in distributed systems where computation is partitioned at multiple granularities:
| Level | Unit | Scope | Configuration API |
|---|---|---|---|
| Worker | JVM process | Cluster-wide | config.setNumWorkers(n)
|
| Executor | Thread | Per-worker | builder.setSpout/setBolt(id, component, parallelism_hint)
|
| Task | Logical instance | Per-executor | declarer.setNumTasks(n)
|
Workers provide process-level isolation. Each worker runs in its own JVM, with its own heap memory and garbage collection. Inter-worker communication happens over the network (even on the same machine), so minimizing cross-worker traffic can improve performance.
Executors provide thread-level parallelism within a worker. Storm assigns each executor a subset of the component's tasks. The executor's thread calls nextTuple() (for Spouts) or execute() (for Bolts) in a loop for each of its assigned tasks.
Tasks are the finest unit of parallelism. The number of tasks for a component is fixed at topology submission time and cannot be changed without restarting the topology. However, the mapping of tasks to executors can be changed at runtime via the storm rebalance command. This is why setting numTasks higher than the initial parallelism hint is useful: it allows future scaling without topology restart.
Parallelism and Stream Groupings
The parallelism configuration interacts closely with stream groupings:
- With shuffle grouping, tuples are distributed evenly across all tasks of the downstream component, naturally leveraging parallelism.
- With fields grouping, tuples are hash-partitioned by field value. If the key distribution is skewed, some tasks may receive disproportionately more tuples, leading to load imbalance regardless of the parallelism setting.
- With global grouping, all tuples go to a single task, effectively negating any parallelism configured for the downstream component.
Guidelines for Configuration
- Start with a small number of workers and executors, then increase based on observed throughput and latency.
- Monitor the topology's capacity metric in the Storm UI. A component with capacity close to 1.0 is a bottleneck that needs more executors.
- Use fields grouping with adequate parallelism for stateful aggregation Bolts.
- Set
numTasksto 2x or 4x the initial executor count to allow room for runtime rebalancing.
Related Pages
| Relationship | Page |
|---|---|
| implemented_by | Heibaiying_BigData_Notes_Storm_Parallelism_Config |
| related | Heibaiying_BigData_Notes_Storm_Topology_Wiring |
| related | Heibaiying_BigData_Notes_Storm_Topology_Deployment |