Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Apache Flink Two Phase Commit Preparation

From Leeroopedia


Knowledge Sources
Domains Stream_Processing, Fault_Tolerance
Last Updated 2026-02-09 00:00 GMT

Overview

A pre-commit phase that collects all pending file operations across active buckets into a set of committable descriptors, enabling exactly-once file delivery.

Description

Two-Phase Commit Preparation implements the "prepare" phase of the two-phase commit protocol for file-based sinks. Before a checkpoint completes, the writer must snapshot all in-progress and pending files into committable descriptors. These descriptors capture enough information to either finalize the files (on commit) or discard them (on abort).

The preparation phase iterates over all active buckets, evaluating each for:

  • Pending files: Files that were rolled (closed) and are ready for commit
  • In-progress files to cleanup: Files that need to be truncated or removed on recovery
  • End-of-input handling: On final flush, all in-progress files are forcefully closed

This separation of prepare and commit ensures that file finalization is atomic with respect to Flink checkpoints, providing exactly-once semantics.

Usage

Use this principle in any sink that requires exactly-once delivery guarantees to a file system. The prepare phase runs synchronously as part of the checkpoint barrier processing, while the actual commit happens asynchronously after the checkpoint completes.

Theoretical Basis

// Abstract two-phase commit preparation
function prepareCommit(endOfInput):
    committables = []
    for each bucket in activeBuckets:
        if not bucket.isActive():
            remove bucket
        else:
            committables.addAll(bucket.prepareCommit(endOfInput))
    return committables  // FileSinkCommittable list

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment