Heuristic:ArroyoSystems Arroyo Parallelism Configuration

Knowledge Sources	Arroyo Default Configuration
Domains	Optimization, Stream_Processing
Last Updated	2026-02-08 08:00 GMT

Overview

Parallelism tuning heuristic: match task slots to available CPU cores and source partitions, using 16 slots per worker as the default with operator chaining enabled.

Description

Arroyo's parallelism model is based on task slots. Each worker process has a fixed number of slots (default 16), and each operator in the dataflow graph is assigned a parallelism level. The scheduler distributes operator subtasks across available slots. Operator chaining (enabled by default) fuses adjacent operators into a single task to reduce serialization and context-switching overhead. In Kubernetes mode, resources are allocated per-slot (default 900m CPU, 500Mi memory).

Usage

Apply this heuristic when sizing worker processes, configuring Kubernetes resource requests, or troubleshooting idle subtasks. The key warning sign is Kafka source subtasks reporting "subscribed to no partitions" — this means parallelism exceeds source partition count.

The Insight (Rule of Thumb)

Action: Set `worker.task-slots` to match available CPU cores. Set pipeline parallelism to not exceed source partition count.
Value: Default is 16 task slots per worker. Kubernetes: 900m CPU + 500Mi memory per slot.
Trade-off: More parallelism = higher throughput but higher coordination overhead and memory. Too much parallelism = idle tasks wasting resources.
Chaining: Keep `pipeline.chaining.enabled = true` (default) to fuse adjacent operators and reduce overhead.

Reasoning

Each subtask runs as an async Tokio task within the worker process. With 16 slots and Tokio's work-stealing scheduler, the worker can efficiently utilize up to 16 cores. However, for Kafka sources, parallelism beyond the partition count results in idle subtasks. The warning message "Kafka Consumer is subscribed to no partitions, as there are more subtasks than partitions... setting idle" explicitly flags this.

In Kubernetes per-slot mode (default since 0.11), resource requests scale linearly: a worker with 4 active slots gets 4 * 900m = 3.6 CPU cores. In per-pod mode (legacy), every pod gets the full resource allocation regardless of slot usage.

The queue-size setting (default 8192) controls the bounded channel between operators. Larger queues buffer more data but consume more memory. This acts as backpressure: when a downstream operator is slow, the upstream operator blocks on the full queue.

Code Evidence

Default task slots from `default.toml:40-45`:

[worker]
task-slots = 16
queue-size = 8192

[process-scheduler]
slots-per-process = 16

Operator chaining from `default.toml:15`:

chaining.enabled = true

Kafka idle subtask warning from `kafka/source/mod.rs:167-169`:

warn!("Kafka Consumer {}-{} is subscribed to no partitions,
as there are more subtasks than partitions... setting idle");

Kubernetes per-slot resources from `default.toml:64-72`:

[kubernetes-scheduler]
resource-mode = "per-slot"

[kubernetes-scheduler.worker]
resources = { requests = { cpu = "900m",  memory = "500Mi" } }
task-slots = 16

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment