Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Apache Paimon Atomic Commit

From Leeroopedia


Knowledge Sources
Domains Data_Lake, Table_Format
Last Updated 2026-02-07 00:00 GMT

Overview

Mechanism for atomically committing written data to make it visible to readers via snapshot isolation.

Description

Atomic commit ensures that all data written in a batch becomes visible simultaneously or not at all. The commit process creates a new Snapshot that records the file changes, partition statistics, and commit metadata. This follows ACID transaction semantics where readers see either the complete committed state or none of the changes. The two-phase protocol (prepare_commit + commit) separates write preparation from the actual atomic commit operation.

The commit phase is the critical boundary between written but invisible data and committed and visible data. Without a successful commit, all data written by a BatchTableWrite session remains invisible to readers. The commit operation atomically updates the table's snapshot metadata, manifest files, and partition statistics in a single coordinated action.

Usage

Use this principle after batch writing to finalize data visibility. Without committing, written data remains invisible to readers. The typical flow is: (1) call prepare_commit() on the writer to collect file metadata as CommitMessage objects, (2) create a BatchTableCommit via the write builder, and (3) call commit() with the commit messages to atomically create a new snapshot.

Theoretical Basis

Implements optimistic concurrency control with snapshot isolation. Key concepts include:

  • Two-phase commit protocol: The prepare_commit() phase collects all file metadata (new data files, compacted files, deletion vectors) into CommitMessage objects. The commit() phase atomically creates a new snapshot referencing these files.
  • Snapshot isolation: Each commit creates a new immutable snapshot. Readers always see a consistent view of the table at a specific snapshot, unaffected by concurrent writes.
  • Atomicity guarantee: Either all files in the commit become visible together, or none do. There is no partial visibility of a commit.
  • Abort semantics: If a commit fails, abort() can be called to clean up written files, ensuring no orphaned data files remain in storage.
  • One-time semantics: BatchTableCommit enforces that each commit instance is used exactly once, preventing accidental double commits.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment