Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Apache Flink Source Checkpoint Recovery

From Leeroopedia


Knowledge Sources
Domains Fault_Tolerance, Stream_Processing
Last Updated 2026-02-09 00:00 GMT

Overview

A checkpoint-based state persistence mechanism that captures the current split assignment and reading progress to enable exactly-once recovery after failures.

Description

Source Checkpoint Recovery ensures that no records are lost or duplicated after a failure by persisting two types of state:

  • Enumerator state: Unassigned splits and (in continuous mode) already-discovered paths, captured in PendingSplitsCheckpoint
  • Reader state: Per-split reading progress (byte offset, records to skip) captured in FileSourceSplitState

On recovery, the enumerator restores from PendingSplitsCheckpoint, re-creating its split pool without re-enumerating already-processed paths. Readers restore their position within each split using the checkpointed offset, ensuring records are not re-emitted.

Usage

This principle operates transparently when Flink checkpointing is enabled. It requires no user configuration but is essential for understanding recovery behavior and at-least-once vs exactly-once semantics.

Theoretical Basis

// Abstract checkpoint/recovery algorithm
function checkpoint():
    enumeratorState = PendingSplitsCheckpoint(unassignedSplits, discoveredPaths)
    readerState = for each split: (offset, recordsToSkipAfterOffset)
    persist(enumeratorState, readerState)

function recover(enumeratorState, readerStates):
    enumerator.restore(enumeratorState.splits, enumeratorState.paths)
    for each reader:
        reader.seekToCheckpointedOffset(readerState.offset)
        reader.skipRecords(readerState.recordsToSkip)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment