Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon CatalogSnapshotCommit

From Leeroopedia


Knowledge Sources
Domains Snapshot Management, Catalog Integration
Last Updated 2026-02-08 00:00 GMT

Overview

CatalogSnapshotCommit is a SnapshotCommit implementation that uses a Catalog to commit snapshots atomically with catalog-level consistency.

Description

The CatalogSnapshotCommit class provides a catalog-based approach to committing snapshots in Apache Paimon tables. Unlike file-based commit mechanisms, this implementation delegates the atomic commit operation to the catalog layer, enabling better integration with catalog systems that provide their own transaction management.

The class wraps a catalog instance and table identifier to route commit operations through the catalog's commit_snapshot method. This approach is particularly useful for catalogs that maintain their own metadata stores (like Hive Metastore or REST catalogs) and need to ensure consistency between filesystem state and catalog metadata.

The commit operation supports branch-based commits and includes partition statistics for maintaining catalog-level metadata. If the underlying catalog doesn't implement the commit_snapshot method, it raises NotImplementedError with a descriptive message.

Usage

Use CatalogSnapshotCommit when working with catalog implementations that provide snapshot commit capabilities, particularly for REST catalogs, Hive catalogs, or custom catalog implementations that manage their own transaction boundaries.

Code Reference

Source Location

Signature

class CatalogSnapshotCommit(SnapshotCommit):
    """A SnapshotCommit using Catalog to commit."""

    def __init__(self, catalog: Catalog, identifier: Identifier, uuid: str):
        """Initialize with catalog, table identifier, and optional UUID."""

    def commit(self, snapshot: Snapshot, branch: str, statistics: List[PartitionStatistics]) -> bool:
        """Commit the snapshot using the catalog."""

    def close(self):
        """Close the catalog and release resources."""

Import

from pypaimon.snapshot.catalog_snapshot_commit import CatalogSnapshotCommit

I/O Contract

Inputs

Name Type Required Description
catalog Catalog Yes Catalog instance for committing
identifier Identifier Yes Table identifier
uuid str Yes Table UUID for verification
snapshot Snapshot Yes Snapshot to commit
branch str Yes Branch name for the commit
statistics List[PartitionStatistics] Yes Partition statistics for catalog metadata

Outputs

Name Type Description
success bool True if commit was successful

Usage Examples

from pypaimon.snapshot.catalog_snapshot_commit import CatalogSnapshotCommit
from pypaimon.snapshot.snapshot_commit import PartitionStatistics

# Create catalog snapshot commit
commit_handler = CatalogSnapshotCommit(
    catalog=catalog,
    identifier=table_identifier,
    uuid=table.uuid
)

# Prepare partition statistics
stats = [
    PartitionStatistics.create(
        partition_spec={"date": "2024-01-01"},
        record_count=1000,
        file_count=5,
        file_size_in_bytes=52428800
    )
]

# Commit snapshot
snapshot = create_new_snapshot()
success = commit_handler.commit(
    snapshot=snapshot,
    branch="main",
    statistics=stats
)

if success:
    print("Snapshot committed successfully via catalog")
else:
    print("Snapshot commit failed")

# Clean up
commit_handler.close()

# Use with context manager
with CatalogSnapshotCommit(catalog, identifier, uuid) as committer:
    committer.commit(snapshot, "main", stats)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment