Principle:Apache Flink Source Checkpoint Recovery

Knowledge Sources	Apache Flink Checkpointing Apache Flink
Domains	Fault_Tolerance, Stream_Processing
Last Updated	2026-02-09 00:00 GMT

Overview

A checkpoint-based state persistence mechanism that captures the current split assignment and reading progress to enable exactly-once recovery after failures.

Description

Source Checkpoint Recovery ensures that no records are lost or duplicated after a failure by persisting two types of state:

Enumerator state: Unassigned splits and (in continuous mode) already-discovered paths, captured in PendingSplitsCheckpoint
Reader state: Per-split reading progress (byte offset, records to skip) captured in FileSourceSplitState

On recovery, the enumerator restores from PendingSplitsCheckpoint, re-creating its split pool without re-enumerating already-processed paths. Readers restore their position within each split using the checkpointed offset, ensuring records are not re-emitted.

Usage

This principle operates transparently when Flink checkpointing is enabled. It requires no user configuration but is essential for understanding recovery behavior and at-least-once vs exactly-once semantics.

Theoretical Basis

// Abstract checkpoint/recovery algorithm
function checkpoint():
    enumeratorState = PendingSplitsCheckpoint(unassignedSplits, discoveredPaths)
    readerState = for each split: (offset, recordsToSkipAfterOffset)
    persist(enumeratorState, readerState)

function recover(enumeratorState, readerStates):
    enumerator.restore(enumeratorState.splits, enumeratorState.paths)
    for each reader:
        reader.seekToCheckpointedOffset(readerState.offset)
        reader.skipRecords(readerState.recordsToSkip)

Related Pages

Implemented By

Implementation:Apache_Flink_PendingSplitsCheckpoint_FromCollectionSnapshot

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment