Principle:Apache Flink Two Phase Commit Preparation
| Knowledge Sources | |
|---|---|
| Domains | Stream_Processing, Fault_Tolerance |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A pre-commit phase that collects all pending file operations across active buckets into a set of committable descriptors, enabling exactly-once file delivery.
Description
Two-Phase Commit Preparation implements the "prepare" phase of the two-phase commit protocol for file-based sinks. Before a checkpoint completes, the writer must snapshot all in-progress and pending files into committable descriptors. These descriptors capture enough information to either finalize the files (on commit) or discard them (on abort).
The preparation phase iterates over all active buckets, evaluating each for:
- Pending files: Files that were rolled (closed) and are ready for commit
- In-progress files to cleanup: Files that need to be truncated or removed on recovery
- End-of-input handling: On final flush, all in-progress files are forcefully closed
This separation of prepare and commit ensures that file finalization is atomic with respect to Flink checkpoints, providing exactly-once semantics.
Usage
Use this principle in any sink that requires exactly-once delivery guarantees to a file system. The prepare phase runs synchronously as part of the checkpoint barrier processing, while the actual commit happens asynchronously after the checkpoint completes.
Theoretical Basis
// Abstract two-phase commit preparation
function prepareCommit(endOfInput):
committables = []
for each bucket in activeBuckets:
if not bucket.isActive():
remove bucket
else:
committables.addAll(bucket.prepareCommit(endOfInput))
return committables // FileSinkCommittable list