Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:ArroyoSystems Arroyo Kubernetes Deployment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Kubernetes
Last Updated 2026-02-08 08:00 GMT

Overview

Kubernetes cluster environment with Helm chart for deploying Arroyo in distributed mode with dynamic worker pod scheduling.

Description

This environment provides the Kubernetes infrastructure for running Arroyo in distributed mode. The Arroyo controller uses the Kubernetes API to dynamically schedule worker pods based on pipeline parallelism requirements. The Helm chart deploys the controller (which includes the API, controller, and compiler services) as a Deployment, with workers spawned as individual pods on demand. Resource allocation supports two modes: per-slot (resources scale with task count) and per-pod (fixed resources per pod). The chart includes RBAC roles for pod management and configurable service accounts.

Usage

Use this environment for production distributed deployments of Arroyo. The Kubernetes scheduler is activated by setting `controller.scheduler = "kubernetes"`. Workers are created and destroyed dynamically as pipelines start and stop. Each worker pod runs the Arroyo worker binary and connects back to the controller via gRPC.

System Requirements

Category Requirement Notes
Kubernetes 1.24+ k8s-openapi 0.24.0 compatibility
Helm 3.x For chart installation
Container Runtime Docker/containerd Standard K8s runtime
CPU per worker slot 900m (default) Configurable via Helm values
Memory per worker slot 500Mi (default) Configurable via Helm values

Dependencies

Kubernetes Resources

  • ServiceAccount (for pod management)
  • Role/RoleBinding (pods create/delete/get/list/watch)
  • Deployment (controller)
  • ConfigMap (configuration)
  • Service (API, gRPC endpoints)

Container Images

  • `ghcr.io/arroyosystems/arroyo:latest` (default for both controller and workers)

Credentials

  • `ARROYO__CONTROLLER__SCHEDULER`: Set to `kubernetes` to enable K8s scheduler
  • Kubernetes ServiceAccount with pod management RBAC permissions
  • Cloud storage credentials (see Object_Storage environment) for checkpoint access from worker pods

Quick Install

# Install via Helm
helm repo add arroyo https://arroyosystems.github.io/helm-charts
helm install arroyo arroyo/arroyo

# Or from source
helm install arroyo ./k8s/arroyo \
  --set config.scheduler=kubernetes \
  --set config.checkpointUrl=s3://my-bucket/checkpoints

Code Evidence

Default worker configuration from `default.toml:62-76`:

[kubernetes-scheduler]
namespace = "default"
resource-mode = "per-slot"

[kubernetes-scheduler.worker]
name-prefix = "arroyo"
image = "ghcr.io/arroyosystems/arroyo:latest"
image-pull-policy = "IfNotPresent"
service-account-name = "default"
resources = { requests = { cpu = "900m",  memory = "500Mi" } }
task-slots = 16
command = "/app/arroyo worker"

Resource mode options from `config.rs:596-605`:

pub enum ResourceMode {
    /// In per-slot mode, tasks are packed onto workers up to the
    /// `task-slots` config, and for each slot the amount of resources
    /// specified in `resources` is provided
    PerSlot,
    /// In per-pod mode, every pod has exactly `task-slots` slots,
    /// and exactly the resources in `resources`, even if it is
    /// scheduled for fewer slots.
    PerPod,
}

Scheduler types from `config.rs:581-588`:

pub enum Scheduler {
    Embedded,   // In-process (for local mode)
    Process,    // Separate OS processes
    Node,       // Arroyo node service
    Kubernetes, // K8s pod scheduler
}

Common Errors

Error Message Cause Solution
`pods is forbidden: User cannot create resource "pods"` Missing RBAC permissions Verify ServiceAccount and Role/RoleBinding
`ImagePullBackOff` Cannot pull worker image Check image registry access and image name
Worker pods stuck in `Pending` Insufficient cluster resources Scale cluster or reduce `resources.requests`
`connection refused` from worker to controller Network policy blocking gRPC Ensure port 5116 accessible between pods

Compatibility Notes

  • Resource modes: `per-slot` (default) scales resources linearly with task count. `per-pod` gives fixed resources regardless of task count (legacy behavior from before 0.11).
  • Worker ports: Workers use fixed ports in K8s mode: RPC=6900, Admin=6901 (vs random ports in process mode).
  • Image consistency: Worker pods must use the same image version as the controller to avoid protocol mismatches.
  • Node selectors and tolerations: Configurable via `kubernetes-scheduler.worker.node-selector` and `tolerations` for scheduling constraints.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment