Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Apache Paimon Catalog Object Model

From Leeroopedia


Knowledge Sources
Domains Catalog, Object_Model
Last Updated 2026-02-08 00:00 GMT

Overview

The catalog object model defines the type system and naming conventions for database objects including tables, views, functions, and partitions, providing a unified abstraction across diverse metastore implementations.

Description

The catalog object model addresses the challenge of representing diverse database objects in a consistent, extensible manner across different metastore backends like Hive Metastore, JDBC-based catalogs, or cloud-native catalog services. The model establishes a hierarchical naming structure with databases as containers and tables, views, and functions as contained objects. Each object type has a well-defined set of properties, metadata fields, and lifecycle operations that must be supported regardless of the underlying storage mechanism.

Identifiers provide the foundational naming abstraction, combining database and object names into qualified references that uniquely identify catalog objects. This multi-level naming enables namespace isolation and prevents naming conflicts across different databases. Table types distinguish between different kinds of tables such as managed tables, external tables, and format-specific tables, each with different ownership and lifecycle semantics. Views encapsulate saved queries with their own schema definitions, allowing complex query logic to be reused and versioned independently.

Functions extend the catalog with user-defined or system functions that can be invoked in queries. The function definition captures the implementation class, parameters, and language runtime requirements, while the function change interface supports atomic updates to function definitions. Partition objects model the physical partitioning of tables, tracking partition-level statistics and metadata separately from the base table definition. The factory pattern enables discovery and instantiation of catalog implementations through service provider interfaces, allowing the system to support multiple catalog backends through pluggable implementations.

Usage

Apply the catalog object model when implementing metastore integrations, building query planning systems that need to resolve object references, or creating catalog migration tools. This pattern is essential when you need to support multiple catalog backends with a unified API.

Theoretical Basis

The catalog object model provides a structured type system for database objects:

Identifier Hierarchy

structure Identifier:
    database: string
    objectName: string

    function toFullyQualifiedName() -> string:
        return database + "." + objectName

    function fromString(qualifiedName) -> Identifier:
        parts = qualifiedName.split(".")
        if parts.length == 1:
            return Identifier(defaultDatabase, parts[0])
        else:
            return Identifier(parts[0], parts[1])

Table Type Classification

enum TableType:
    MANAGED          // Catalog owns data lifecycle
    EXTERNAL         // Data managed externally
    VIEW             // Virtual table defined by query
    MATERIALIZED     // Physical snapshot of view
    FORMAT_TABLE     // External format (Parquet, ORC, etc.)

structure CatalogTableMetadata:
    identifier: Identifier
    tableType: TableType
    schema: Schema
    partitionKeys: list<string>
    properties: map<string, string>
    comment: string
    createTime: Instant
    lastModifiedTime: Instant

View Definition

structure View:
    identifier: Identifier
    query: string               // SQL query text
    schema: ViewSchema          // Result schema
    dialect: string             // SQL dialect (e.g., "flink", "spark")
    comment: string
    properties: map<string, string>

structure ViewSchema:
    fields: list<Field>

    function validateAgainstQuery(query) -> boolean:
        // Verify schema matches query result

enum ViewChange:
    SET_QUERY(newQuery)
    SET_COMMENT(newComment)
    SET_PROPERTY(key, value)
    REMOVE_PROPERTY(key)

Function Definition

structure Function:
    identifier: Identifier
    className: string           // Implementation class
    language: string            // "java", "python", "scala"
    resources: list<string>     // JAR files or dependencies
    kind: FunctionKind          // SCALAR, TABLE, AGGREGATE

enum FunctionKind:
    SCALAR                      // Regular function
    TABLE                       // Table-valued function
    AGGREGATE                   // Aggregation function

structure FunctionDefinition:
    function: Function
    implementation: FunctionImpl

structure FunctionImpl:
    method: string              // Method name to invoke
    parameters: list<DataType>  // Parameter types
    returnType: DataType        // Return type

enum FunctionChange:
    SET_CLASS(newClassName)
    ADD_RESOURCE(resourcePath)
    REMOVE_RESOURCE(resourcePath)
    SET_PROPERTY(key, value)

Partition Representation

structure Partition:
    identifier: Identifier      // Table identifier
    partitionSpec: map<string, string>  // Partition key-value pairs
    location: string            // Storage path
    recordCount: long
    fileCount: int
    totalSize: long

    function toPartitionPath() -> string:
        // Convert partition spec to path format
        // Example: {year=2024, month=01} -> "year=2024/month=01"
        return partitionSpec.entries()
            .map(e => e.key + "=" + e.value)
            .join("/")

structure PartitionStatistics:
    partition: Partition
    columnStatistics: map<string, ColumnStats>
    lastAnalyzedTime: Instant

Catalog Operations Interface

interface Catalog:
    // Database operations
    function listDatabases() -> list<string>
    function databaseExists(dbName) -> boolean
    function createDatabase(dbName, properties)
    function dropDatabase(dbName, cascade)

    // Table operations
    function listTables(database) -> list<string>
    function tableExists(identifier) -> boolean
    function getTable(identifier) -> Table
    function createTable(identifier, schema, properties)
    function alterTable(identifier, changes)
    function dropTable(identifier)

    // View operations
    function listViews(database) -> list<string>
    function getView(identifier) -> View
    function createView(identifier, view)
    function alterView(identifier, changes)
    function dropView(identifier)

    // Function operations
    function listFunctions(database) -> list<string>
    function getFunction(identifier) -> Function
    function createFunction(identifier, function)
    function dropFunction(identifier)

    // Partition operations
    function listPartitions(tableIdentifier) -> list<Partition>
    function getPartition(tableIdentifier, partitionSpec) -> Partition
    function createPartition(partition)
    function dropPartition(tableIdentifier, partitionSpec)

Factory and Discovery Pattern

interface CatalogFactory:
    function identifier() -> string
    function createCatalog(context, options) -> Catalog

structure FactoryUtil:
    function discoverFactory(identifier) -> CatalogFactory:
        // Use service provider interface (SPI) to discover factories
        factories = ServiceLoader.load(CatalogFactory.class)

        for each factory in factories:
            if factory.identifier() == identifier:
                return factory

        throw CatalogException("No factory found for: " + identifier)

structure CatalogEnvironment:
    configuration: Configuration
    classLoader: ClassLoader
    temporaryObjects: map<Identifier, Object>

    function resolve(identifier) -> Object:
        // Check temporary objects first (session-scoped)
        if temporaryObjects.contains(identifier):
            return temporaryObjects.get(identifier)

        // Fall back to catalog lookup
        return catalog.getTable(identifier)

Instant and Timestamp Handling

structure Instant:
    millisSinceEpoch: long

    function now() -> Instant:
        return Instant(currentTimeMillis())

    function toISO8601() -> string:
        return formatISO8601(millisSinceEpoch)

Exception Handling

class CatalogException extends Exception:
    identifier: Identifier
    operation: string

    function createTableAlreadyExists(identifier):
        return CatalogException(
            identifier,
            "CREATE TABLE",
            "Table already exists: " + identifier
        )

    function createTableNotFound(identifier):
        return CatalogException(
            identifier,
            "GET TABLE",
            "Table not found: " + identifier
        )

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment