Environment:ArroyoSystems Arroyo Kubernetes Deployment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Kubernetes |
| Last Updated | 2026-02-08 08:00 GMT |
Overview
Kubernetes cluster environment with Helm chart for deploying Arroyo in distributed mode with dynamic worker pod scheduling.
Description
This environment provides the Kubernetes infrastructure for running Arroyo in distributed mode. The Arroyo controller uses the Kubernetes API to dynamically schedule worker pods based on pipeline parallelism requirements. The Helm chart deploys the controller (which includes the API, controller, and compiler services) as a Deployment, with workers spawned as individual pods on demand. Resource allocation supports two modes: per-slot (resources scale with task count) and per-pod (fixed resources per pod). The chart includes RBAC roles for pod management and configurable service accounts.
Usage
Use this environment for production distributed deployments of Arroyo. The Kubernetes scheduler is activated by setting `controller.scheduler = "kubernetes"`. Workers are created and destroyed dynamically as pipelines start and stop. Each worker pod runs the Arroyo worker binary and connects back to the controller via gRPC.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Kubernetes | 1.24+ | k8s-openapi 0.24.0 compatibility |
| Helm | 3.x | For chart installation |
| Container Runtime | Docker/containerd | Standard K8s runtime |
| CPU per worker slot | 900m (default) | Configurable via Helm values |
| Memory per worker slot | 500Mi (default) | Configurable via Helm values |
Dependencies
Kubernetes Resources
- ServiceAccount (for pod management)
- Role/RoleBinding (pods create/delete/get/list/watch)
- Deployment (controller)
- ConfigMap (configuration)
- Service (API, gRPC endpoints)
Container Images
- `ghcr.io/arroyosystems/arroyo:latest` (default for both controller and workers)
Credentials
- `ARROYO__CONTROLLER__SCHEDULER`: Set to `kubernetes` to enable K8s scheduler
- Kubernetes ServiceAccount with pod management RBAC permissions
- Cloud storage credentials (see Object_Storage environment) for checkpoint access from worker pods
Quick Install
# Install via Helm
helm repo add arroyo https://arroyosystems.github.io/helm-charts
helm install arroyo arroyo/arroyo
# Or from source
helm install arroyo ./k8s/arroyo \
--set config.scheduler=kubernetes \
--set config.checkpointUrl=s3://my-bucket/checkpoints
Code Evidence
Default worker configuration from `default.toml:62-76`:
[kubernetes-scheduler]
namespace = "default"
resource-mode = "per-slot"
[kubernetes-scheduler.worker]
name-prefix = "arroyo"
image = "ghcr.io/arroyosystems/arroyo:latest"
image-pull-policy = "IfNotPresent"
service-account-name = "default"
resources = { requests = { cpu = "900m", memory = "500Mi" } }
task-slots = 16
command = "/app/arroyo worker"
Resource mode options from `config.rs:596-605`:
pub enum ResourceMode {
/// In per-slot mode, tasks are packed onto workers up to the
/// `task-slots` config, and for each slot the amount of resources
/// specified in `resources` is provided
PerSlot,
/// In per-pod mode, every pod has exactly `task-slots` slots,
/// and exactly the resources in `resources`, even if it is
/// scheduled for fewer slots.
PerPod,
}
Scheduler types from `config.rs:581-588`:
pub enum Scheduler {
Embedded, // In-process (for local mode)
Process, // Separate OS processes
Node, // Arroyo node service
Kubernetes, // K8s pod scheduler
}
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `pods is forbidden: User cannot create resource "pods"` | Missing RBAC permissions | Verify ServiceAccount and Role/RoleBinding |
| `ImagePullBackOff` | Cannot pull worker image | Check image registry access and image name |
| Worker pods stuck in `Pending` | Insufficient cluster resources | Scale cluster or reduce `resources.requests` |
| `connection refused` from worker to controller | Network policy blocking gRPC | Ensure port 5116 accessible between pods |
Compatibility Notes
- Resource modes: `per-slot` (default) scales resources linearly with task count. `per-pod` gives fixed resources regardless of task count (legacy behavior from before 0.11).
- Worker ports: Workers use fixed ports in K8s mode: RPC=6900, Admin=6901 (vs random ports in process mode).
- Image consistency: Worker pods must use the same image version as the controller to avoid protocol mismatches.
- Node selectors and tolerations: Configurable via `kubernetes-scheduler.worker.node-selector` and `tolerations` for scheduling constraints.