Principle:Apache Spark Application Lifecycle Monitoring
Metadata
| Field | Value |
|---|---|
| Domains | Monitoring, API_Design |
Overview
An event-driven monitoring pattern that tracks distributed application state transitions through a finite state machine with listener callbacks.
Description
Once a distributed application is submitted, the submitter needs to track its lifecycle — from submission through execution to completion or failure. The lifecycle monitoring pattern models application state as a finite state machine with well-defined terminal and non-terminal states. An observer/listener pattern enables reactive handling of state transitions without polling. This decouples the submission system from the execution system.
Key aspects of the pattern:
- State machine model — application lifecycle is represented as a directed graph of discrete states with well-defined transitions
- Terminal state detection — states are classified as terminal (final) or non-terminal, enabling automated completion detection
- Event-driven notification — listeners receive callbacks on state changes, eliminating the need for periodic polling
- Decoupled architecture — the monitoring interface is independent of the submission mechanism, allowing different monitoring strategies
State Classification
| State | Terminal | Description |
|---|---|---|
| UNKNOWN | No | Initial state before connection is established |
| CONNECTED | No | Communication channel with the application is active |
| SUBMITTED | No | Application has been submitted to the cluster manager |
| RUNNING | No | Application is actively executing |
| FINISHED | Yes | Application completed successfully |
| FAILED | Yes | Application terminated due to an error |
| KILLED | Yes | Application was explicitly terminated by a user or system |
| LOST | Yes | Communication with the application was lost |
Usage
Use this when you need to programmatically monitor Spark application status — for example, in job schedulers that need to know when applications finish or fail to trigger downstream actions.
Theoretical Basis
Finite State Machine with states:
UNKNOWN -> CONNECTED -> SUBMITTED -> RUNNING -> {FINISHED, FAILED, KILLED, LOST}
Terminal states have isFinal()=true. The Observer pattern via the Listener interface enables reactive state tracking.
The FSM provides:
- Deterministic transitions — each state has a defined set of valid successor states
- Terminal detection — the isFinal() method enables simple loop termination in polling-based monitoring
- State ordering — states follow a natural progression from submission to completion, enabling progress tracking
The Observer pattern provides:
- Push-based notification — eliminates polling latency and resource waste
- Multiple observers — several listeners can independently monitor the same application
- Separation of concerns — monitoring logic is decoupled from application logic