Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:ArroyoSystems Arroyo Parallelism Configuration

From Leeroopedia






Knowledge Sources
Domains Optimization, Stream_Processing
Last Updated 2026-02-08 08:00 GMT

Overview

Parallelism tuning heuristic: match task slots to available CPU cores and source partitions, using 16 slots per worker as the default with operator chaining enabled.

Description

Arroyo's parallelism model is based on task slots. Each worker process has a fixed number of slots (default 16), and each operator in the dataflow graph is assigned a parallelism level. The scheduler distributes operator subtasks across available slots. Operator chaining (enabled by default) fuses adjacent operators into a single task to reduce serialization and context-switching overhead. In Kubernetes mode, resources are allocated per-slot (default 900m CPU, 500Mi memory).

Usage

Apply this heuristic when sizing worker processes, configuring Kubernetes resource requests, or troubleshooting idle subtasks. The key warning sign is Kafka source subtasks reporting "subscribed to no partitions" — this means parallelism exceeds source partition count.

The Insight (Rule of Thumb)

  • Action: Set `worker.task-slots` to match available CPU cores. Set pipeline parallelism to not exceed source partition count.
  • Value: Default is 16 task slots per worker. Kubernetes: 900m CPU + 500Mi memory per slot.
  • Trade-off: More parallelism = higher throughput but higher coordination overhead and memory. Too much parallelism = idle tasks wasting resources.
  • Chaining: Keep `pipeline.chaining.enabled = true` (default) to fuse adjacent operators and reduce overhead.

Reasoning

Each subtask runs as an async Tokio task within the worker process. With 16 slots and Tokio's work-stealing scheduler, the worker can efficiently utilize up to 16 cores. However, for Kafka sources, parallelism beyond the partition count results in idle subtasks. The warning message "Kafka Consumer is subscribed to no partitions, as there are more subtasks than partitions... setting idle" explicitly flags this.

In Kubernetes per-slot mode (default since 0.11), resource requests scale linearly: a worker with 4 active slots gets 4 * 900m = 3.6 CPU cores. In per-pod mode (legacy), every pod gets the full resource allocation regardless of slot usage.

The queue-size setting (default 8192) controls the bounded channel between operators. Larger queues buffer more data but consume more memory. This acts as backpressure: when a downstream operator is slow, the upstream operator blocks on the full queue.

Code Evidence

Default task slots from `default.toml:40-45`:

[worker]
task-slots = 16
queue-size = 8192

[process-scheduler]
slots-per-process = 16

Operator chaining from `default.toml:15`:

chaining.enabled = true

Kafka idle subtask warning from `kafka/source/mod.rs:167-169`:

warn!("Kafka Consumer {}-{} is subscribed to no partitions,
as there are more subtasks than partitions... setting idle");

Kubernetes per-slot resources from `default.toml:64-72`:

[kubernetes-scheduler]
resource-mode = "per-slot"

[kubernetes-scheduler.worker]
resources = { requests = { cpu = "900m",  memory = "500Mi" } }
task-slots = 16

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment