Principle:Apache Flink Source Checkpoint Recovery
| Knowledge Sources | |
|---|---|
| Domains | Fault_Tolerance, Stream_Processing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A checkpoint-based state persistence mechanism that captures the current split assignment and reading progress to enable exactly-once recovery after failures.
Description
Source Checkpoint Recovery ensures that no records are lost or duplicated after a failure by persisting two types of state:
- Enumerator state: Unassigned splits and (in continuous mode) already-discovered paths, captured in PendingSplitsCheckpoint
- Reader state: Per-split reading progress (byte offset, records to skip) captured in FileSourceSplitState
On recovery, the enumerator restores from PendingSplitsCheckpoint, re-creating its split pool without re-enumerating already-processed paths. Readers restore their position within each split using the checkpointed offset, ensuring records are not re-emitted.
Usage
This principle operates transparently when Flink checkpointing is enabled. It requires no user configuration but is essential for understanding recovery behavior and at-least-once vs exactly-once semantics.
Theoretical Basis
// Abstract checkpoint/recovery algorithm
function checkpoint():
enumeratorState = PendingSplitsCheckpoint(unassignedSplits, discoveredPaths)
readerState = for each split: (offset, recordsToSkipAfterOffset)
persist(enumeratorState, readerState)
function recover(enumeratorState, readerStates):
enumerator.restore(enumeratorState.splits, enumeratorState.paths)
for each reader:
reader.seekToCheckpointedOffset(readerState.offset)
reader.skipRecords(readerState.recordsToSkip)