Implementation:Apache Paimon CatalogSnapshotCommit
| Knowledge Sources | |
|---|---|
| Domains | Snapshot Management, Catalog Integration |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
CatalogSnapshotCommit is a SnapshotCommit implementation that uses a Catalog to commit snapshots atomically with catalog-level consistency.
Description
The CatalogSnapshotCommit class provides a catalog-based approach to committing snapshots in Apache Paimon tables. Unlike file-based commit mechanisms, this implementation delegates the atomic commit operation to the catalog layer, enabling better integration with catalog systems that provide their own transaction management.
The class wraps a catalog instance and table identifier to route commit operations through the catalog's commit_snapshot method. This approach is particularly useful for catalogs that maintain their own metadata stores (like Hive Metastore or REST catalogs) and need to ensure consistency between filesystem state and catalog metadata.
The commit operation supports branch-based commits and includes partition statistics for maintaining catalog-level metadata. If the underlying catalog doesn't implement the commit_snapshot method, it raises NotImplementedError with a descriptive message.
Usage
Use CatalogSnapshotCommit when working with catalog implementations that provide snapshot commit capabilities, particularly for REST catalogs, Hive catalogs, or custom catalog implementations that manage their own transaction boundaries.
Code Reference
Source Location
- Repository: Apache_Paimon
- File: paimon-python/pypaimon/snapshot/catalog_snapshot_commit.py
Signature
class CatalogSnapshotCommit(SnapshotCommit):
"""A SnapshotCommit using Catalog to commit."""
def __init__(self, catalog: Catalog, identifier: Identifier, uuid: str):
"""Initialize with catalog, table identifier, and optional UUID."""
def commit(self, snapshot: Snapshot, branch: str, statistics: List[PartitionStatistics]) -> bool:
"""Commit the snapshot using the catalog."""
def close(self):
"""Close the catalog and release resources."""
Import
from pypaimon.snapshot.catalog_snapshot_commit import CatalogSnapshotCommit
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| catalog | Catalog | Yes | Catalog instance for committing |
| identifier | Identifier | Yes | Table identifier |
| uuid | str | Yes | Table UUID for verification |
| snapshot | Snapshot | Yes | Snapshot to commit |
| branch | str | Yes | Branch name for the commit |
| statistics | List[PartitionStatistics] | Yes | Partition statistics for catalog metadata |
Outputs
| Name | Type | Description |
|---|---|---|
| success | bool | True if commit was successful |
Usage Examples
from pypaimon.snapshot.catalog_snapshot_commit import CatalogSnapshotCommit
from pypaimon.snapshot.snapshot_commit import PartitionStatistics
# Create catalog snapshot commit
commit_handler = CatalogSnapshotCommit(
catalog=catalog,
identifier=table_identifier,
uuid=table.uuid
)
# Prepare partition statistics
stats = [
PartitionStatistics.create(
partition_spec={"date": "2024-01-01"},
record_count=1000,
file_count=5,
file_size_in_bytes=52428800
)
]
# Commit snapshot
snapshot = create_new_snapshot()
success = commit_handler.commit(
snapshot=snapshot,
branch="main",
statistics=stats
)
if success:
print("Snapshot committed successfully via catalog")
else:
print("Snapshot commit failed")
# Clean up
commit_handler.close()
# Use with context manager
with CatalogSnapshotCommit(catalog, identifier, uuid) as committer:
committer.commit(snapshot, "main", stats)