Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:ArroyoSystems Arroyo Pipeline State Monitoring

From Leeroopedia


Template:Principle

Summary

Pipeline State Monitoring is the principle of polling pipeline state to track job lifecycle transitions in the Arroyo streaming engine. The CLI polls the API at a fixed interval for job state changes, enabling it to wait for a pipeline to reach a target state (such as Running, Stopped, or Failed) and react accordingly. This provides a simple, robust mechanism for coordinating between the CLI and the asynchronous pipeline execution engine.

Theoretical Basis

State polling is a simple but robust mechanism for tracking asynchronous operations. In distributed systems, there are two primary approaches for monitoring the state of long-running operations:

  • Push-based (event-driven): The server notifies the client of state changes via WebSockets, server-sent events, or callback mechanisms.
  • Pull-based (polling): The client periodically queries the server for the current state.

Arroyo's CLI uses the pull-based approach for pipeline state monitoring. While push-based approaches can offer lower latency notifications, polling provides several advantages in this context:

Simplicity

Polling requires no additional infrastructure beyond the existing REST API. There is no need for WebSocket connections, event queues, or callback registration. The client simply calls the same get_pipeline_jobs API endpoint at regular intervals.

Robustness

Polling is inherently resilient to transient network issues. If a single poll fails, the next one will succeed and provide the correct state. There is no connection state to maintain or reconnect logic to implement. The retry logic built into the get_state function (with exponential backoff) further improves robustness.

Sufficient Responsiveness

By polling at a fixed interval of 100ms, the client achieves near-real-time state tracking. The overhead is negligible for a local cluster setup, and the 100ms latency is imperceptible to the user waiting for a pipeline to start or stop.

Polling Pattern

The state monitoring follows this pattern:

  1. Initial State Query -- Get the current state of the pipeline's job.
  2. Comparison Loop -- Repeatedly query the state at 100ms intervals.
  3. State Change Detection -- When the state differs from the last observed state, log the transition.
  4. Target Match -- If the current state matches any of the expected target states, return success.
  5. Failure Detection -- If the current state is "Failed", return an error immediately regardless of the target states.

This pattern supports waiting for multiple possible target states. For example, the shutdown handler waits for either "Stopped" or "Failed", accepting either as a terminal condition.

State Transitions

A pipeline job in Arroyo progresses through a series of states:

State Description Terminal?
Created Pipeline has been created but not yet scheduled No
Scheduling Controller is assigning workers and preparing execution No
Running Pipeline is actively processing data No (target for submission)
Stopping Pipeline is performing a final checkpoint before stopping No
Stopped Pipeline has stopped cleanly with a checkpoint Yes (for shutdown)
Failed Pipeline encountered an unrecoverable error Yes (always error)

Retry Semantics

The state retrieval function includes built-in retry logic for API failures:

  • Up to 10 retries per state query
  • Initial backoff of 100ms with maximum backoff of 2 seconds
  • Warnings logged on each retry attempt

This means a transient API failure (e.g., the API server briefly overloaded during startup) does not cause the monitoring to fail. Only persistent API failures result in a panic.

Usage Contexts

State monitoring is used in two contexts within the local cluster:

Context Target States Purpose
Pipeline submission ["Running"] Wait for the pipeline to start before printing the dashboard URL
Graceful shutdown ["Stopped", "Failed"] Wait for the pipeline to complete its final checkpoint before exiting

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment