Principle:ArroyoSystems Arroyo Checkpoint Coordination
Checkpoint Coordination
Principle: Coordinating distributed checkpoint completion across all operators and subtasks. The controller tracks which subtasks have completed their checkpoint and determines when the global checkpoint is done.
Theoretical Basis
In a distributed checkpoint protocol, individual operators complete their local checkpoints at different times. A coordinator is needed to determine when the global checkpoint -- spanning all operators and their parallel subtasks -- is fully complete. Only after global completion is the checkpoint considered durable and usable for recovery.
Per-Operator Tracking
Each operator in the dataflow graph may have multiple subtasks (parallel instances). The coordinator maintains a counter for each operator, tracking how many subtasks have reported completion. An operator's checkpoint is complete when all of its subtasks have reported.
| Concept | Description |
|---|---|
| Operator | A logical node in the dataflow graph (e.g., a map, filter, or join) |
| Subtask | A parallel instance of an operator, processing a partition of the data |
| Subtask count | The expected number of subtasks per operator (determined by parallelism) |
| Completion counter | Tracks how many subtasks have reported for each operator |
Global Completion Detection
The global checkpoint is complete when every operator has all of its subtasks reported. This is a two-level aggregation:
- Subtask level: Each subtask independently completes its local checkpoint and reports to the coordinator.
- Operator level: When all subtasks of an operator have reported, the operator's checkpoint is complete.
- Global level: When all operators are complete, the global checkpoint is done.
Metadata Aggregation
As subtasks report completion, they provide per-subtask metadata (file references, byte counts, watermarks). The coordinator aggregates this metadata into a global checkpoint record that captures the complete state of the system at that epoch.
Epoch Management
The coordinator also manages epoch lifecycle:
- Current epoch: The epoch currently being checkpointed.
- Minimum epoch (
min_epoch): The oldest epoch whose state must be retained. Epochs older thanmin_epochcan be garbage-collected from object storage. - Epoch advancement: After global checkpoint completion, the current epoch is advanced and
min_epochmay be updated.
Domains
- Stream_Processing: Checkpoint coordination is essential for consistent distributed snapshots in streaming systems.
- Fault_Tolerance: Only globally-complete checkpoints can be used for recovery.
- Distributed_Systems: Coordinating completion across distributed participants is a fundamental distributed systems problem.