Principle:Heibaiying BigData Notes Storm Parallelism Configuration

Overview

Property	Value
Concept	Storm Parallelism Configuration
Category	Stream Processing / Performance Tuning
Applies To	Apache Storm Topologies
Prerequisites	Understanding of Storm topology wiring, Spouts, and Bolts

Description

Parallelism in Apache Storm controls how many concurrent processing units are allocated to each component (Spout or Bolt) in a topology. Proper parallelism configuration is essential for achieving high throughput, low latency, and efficient resource utilization.

Storm's parallelism model operates at three distinct levels:

Workers (JVM processes) -- Each worker is a separate JVM process running on a cluster node. A topology can be distributed across multiple workers, each potentially on a different physical machine.
Executors (threads) -- Each worker contains one or more executors. An executor is a thread that runs one or more tasks for a specific component. The number of executors is set via the parallelism hint when registering a Spout or Bolt.
Tasks (logical instances) -- Each executor runs one or more tasks. A task is a logical instance of a Spout or Bolt. By default, each executor runs exactly one task, but this can be overridden with setNumTasks().

The relationship is: Workers >= 1, each worker contains Executors >= 1, each executor runs Tasks >= 1.

Usage

Parallelism configuration is used when:

Scaling throughput -- Increasing the number of executors for a bottleneck component allows more tuples to be processed concurrently.
Distributing load -- Spreading workers across multiple machines distributes the computational and memory burden.
Balancing the pipeline -- Different components may have different processing costs. Assigning more executors to expensive Bolts prevents them from becoming bottlenecks.
Enabling dynamic rebalancing -- Setting numTasks higher than the executor count allows Storm to rebalance the topology at runtime without restarting it, by redistributing tasks among executors.

The three levels of parallelism are configured through different APIs:

// Level 1: Number of worker processes
Config config = new Config();
config.setNumWorkers(2);

// Level 2: Parallelism hint (number of executors/threads)
builder.setSpout("spout", new MySpout(), 2);        // 2 executors
builder.setBolt("bolt", new MyBolt(), 4);            // 4 executors

// Level 3: Number of tasks per component
builder.setBolt("bolt", new MyBolt(), 4).setNumTasks(8);  // 4 executors, 8 tasks

Theoretical Basis

The Three-Level Hierarchy

Storm's parallelism hierarchy reflects a common pattern in distributed systems where computation is partitioned at multiple granularities:

Level	Unit	Scope	Configuration API
Worker	JVM process	Cluster-wide	`config.setNumWorkers(n)`
Executor	Thread	Per-worker	`builder.setSpout/setBolt(id, component, parallelism_hint)`
Task	Logical instance	Per-executor	`declarer.setNumTasks(n)`

Workers provide process-level isolation. Each worker runs in its own JVM, with its own heap memory and garbage collection. Inter-worker communication happens over the network (even on the same machine), so minimizing cross-worker traffic can improve performance.

Executors provide thread-level parallelism within a worker. Storm assigns each executor a subset of the component's tasks. The executor's thread calls nextTuple() (for Spouts) or execute() (for Bolts) in a loop for each of its assigned tasks.

Tasks are the finest unit of parallelism. The number of tasks for a component is fixed at topology submission time and cannot be changed without restarting the topology. However, the mapping of tasks to executors can be changed at runtime via the storm rebalance command. This is why setting numTasks higher than the initial parallelism hint is useful: it allows future scaling without topology restart.

Parallelism and Stream Groupings

The parallelism configuration interacts closely with stream groupings:

With shuffle grouping, tuples are distributed evenly across all tasks of the downstream component, naturally leveraging parallelism.
With fields grouping, tuples are hash-partitioned by field value. If the key distribution is skewed, some tasks may receive disproportionately more tuples, leading to load imbalance regardless of the parallelism setting.
With global grouping, all tuples go to a single task, effectively negating any parallelism configured for the downstream component.

Guidelines for Configuration

Start with a small number of workers and executors, then increase based on observed throughput and latency.
Monitor the topology's capacity metric in the Storm UI. A component with capacity close to 1.0 is a bottleneck that needs more executors.
Use fields grouping with adequate parallelism for stateful aggregation Bolts.
Set numTasks to 2x or 4x the initial executor count to allow room for runtime rebalancing.

Related Pages

Relationship	Page
implemented_by	Heibaiying_BigData_Notes_Storm_Parallelism_Config
related	Heibaiying_BigData_Notes_Storm_Topology_Wiring
related	Heibaiying_BigData_Notes_Storm_Topology_Deployment

Implementation:Heibaiying_BigData_Notes_Storm_Parallelism_Config

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment