Principle:Apache Paimon Schema Definition and Table Creation

Knowledge Sources	Apache Paimon
Domains	Data_Lake, Table_Format
Last Updated	2026-02-07 00:00 GMT

Overview

Mechanism for defining table schemas and creating databases and tables within a data lake catalog.

Description

Schema definition and table creation involve specifying column types, partition keys, primary keys, and table options to create a structured table in the catalog. The Schema class wraps PyArrow schemas with additional Paimon-specific metadata. Tables are created atomically via the Catalog interface, which handles both metadata registration and storage initialization. The process ensures type compatibility between PyArrow and Paimon type systems.

Databases serve as logical namespaces that group related tables together. Tables within a database are identified by a fully qualified name (e.g., 'db.table') encapsulated in an Identifier object. The schema defines the physical structure of the table, including column data types, partitioning strategy, primary key constraints, and table-level options such as bucket count.

Usage

Use this principle after catalog initialization when setting up new tables for data storage. This is required before any read or write operations can be performed. The typical workflow involves: (1) creating a database if it does not exist, (2) defining the table schema with column types, partition keys, and primary keys, and (3) creating the table in the catalog with the specified schema.

Theoretical Basis

Follows the schema-on-write pattern where table structure is defined at creation time. This approach provides several guarantees:

Partition pruning: Partition keys enable the query engine to skip entire partitions that do not match a filter predicate, dramatically reducing I/O.
Primary key constraints: Primary keys enable merge-on-read semantics where updates to existing rows are handled by merging new writes with existing data during reads.
Type safety: The schema enforces column types at write time, preventing type mismatches from corrupting the data lake.
Atomic creation: Table creation is atomic -- either the table and all its metadata are fully created, or nothing is changed in the catalog.

Related Pages

Implemented By

Implementation:Apache_Paimon_Catalog_Create_Database_and_Table

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment