Principle:Apache Flink Checkpoint Position Tracking

Knowledge Sources	Apache_Flink
Domains	Checkpointing, File_Source
Last Updated	2026-02-09 00:00 GMT

Overview

Description

Checkpoint Position Tracking is a pattern used by Apache Flink's file source connector framework to precisely record reader progress within file splits, enabling exactly-once processing guarantees upon failure recovery. The pattern associates each emitted record with a two-part position descriptor consisting of a byte offset and a record skip count, which together encode the exact location the reader must resume from after a checkpoint is restored.

This design ensures that when a Flink job recovers from a checkpoint, the file source reader can seek to the saved byte offset within the split and then skip the specified number of records to arrive at exactly the point where processing left off. The position always points to after the most recently processed record, so that recovery resumes from the next unprocessed record.

Theoretical Basis

The checkpoint position tracking pattern is grounded in the theory of deterministic replay with minimal state. In distributed stream processing, exactly-once semantics require that every source can be rewound to a precise, reproducible point. For file-based sources, this is complicated by the fact that not all file formats support byte-level random access; some formats (e.g., compressed files, columnar formats with pushed-down filters) only support coarse-grained positioning.

The two-part position model (offset + record skip count) addresses this by supporting a spectrum of positioning granularity:

Positioning Strategy	Offset	Record Skip Count	Use Case
No offset (sequential only)	`NO_OFFSET (-1)`	Total records processed	Formats with no addressable positions; reader replays from the start and skips N records.
Coarse-grained offset	Block/sync marker position	Records after marker	Formats with periodic sync markers (e.g., Avro data files); reader seeks to the nearest marker and skips records within that block.
Fine-grained offset	Exact byte position	0 or 1	Formats where every record boundary is addressable (e.g., line-delimited text); reader seeks directly to the record.

A critical design decision is that the position always points after the current record rather than at the current record. This avoids the need to unconditionally skip one record on recovery, which would be problematic for formats with pushed-down filters: if the filter predicate changes between a savepoint and restore, skipping "one record" could skip the wrong record entirely. By pointing to the resume position directly, the system is robust to filter predicate evolution across savepoints.

The pattern also leverages immutable/mutable duality through the RecordAndPosition and MutableRecordAndPosition classes. The immutable variant provides safety for checkpoint state that must not be inadvertently modified, while the mutable variant enables efficient reuse within tight iteration loops of BulkFormat.RecordIterator implementations, avoiding per-record object allocation.

// Immutable: safe for checkpoint storage
RecordAndPosition<RowData> checkpointState =
    new RecordAndPosition<>(record, byteOffset, skipCount);

// Mutable: efficient for iteration loops
MutableRecordAndPosition<RowData> reusable = new MutableRecordAndPosition<>();
reusable.set(record, offset, skipCount);
// For sequential records at the same offset:
reusable.setNext(nextRecord); // increments skipCount by 1

The underlying CheckpointedPosition class encapsulates the serializable (offset, recordsAfterOffset) pair that is persisted into Flink's distributed snapshots, providing the durable state from which recovery reconstructs the reader position.

Related Pages

Implementation:Apache_Flink_RecordAndPosition

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment