Principle:Apache Paimon Catalog Object Model
| Knowledge Sources | |
|---|---|
| Domains | Catalog, Object_Model |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
The catalog object model defines the type system and naming conventions for database objects including tables, views, functions, and partitions, providing a unified abstraction across diverse metastore implementations.
Description
The catalog object model addresses the challenge of representing diverse database objects in a consistent, extensible manner across different metastore backends like Hive Metastore, JDBC-based catalogs, or cloud-native catalog services. The model establishes a hierarchical naming structure with databases as containers and tables, views, and functions as contained objects. Each object type has a well-defined set of properties, metadata fields, and lifecycle operations that must be supported regardless of the underlying storage mechanism.
Identifiers provide the foundational naming abstraction, combining database and object names into qualified references that uniquely identify catalog objects. This multi-level naming enables namespace isolation and prevents naming conflicts across different databases. Table types distinguish between different kinds of tables such as managed tables, external tables, and format-specific tables, each with different ownership and lifecycle semantics. Views encapsulate saved queries with their own schema definitions, allowing complex query logic to be reused and versioned independently.
Functions extend the catalog with user-defined or system functions that can be invoked in queries. The function definition captures the implementation class, parameters, and language runtime requirements, while the function change interface supports atomic updates to function definitions. Partition objects model the physical partitioning of tables, tracking partition-level statistics and metadata separately from the base table definition. The factory pattern enables discovery and instantiation of catalog implementations through service provider interfaces, allowing the system to support multiple catalog backends through pluggable implementations.
Usage
Apply the catalog object model when implementing metastore integrations, building query planning systems that need to resolve object references, or creating catalog migration tools. This pattern is essential when you need to support multiple catalog backends with a unified API.
Theoretical Basis
The catalog object model provides a structured type system for database objects:
Identifier Hierarchy
structure Identifier:
database: string
objectName: string
function toFullyQualifiedName() -> string:
return database + "." + objectName
function fromString(qualifiedName) -> Identifier:
parts = qualifiedName.split(".")
if parts.length == 1:
return Identifier(defaultDatabase, parts[0])
else:
return Identifier(parts[0], parts[1])
Table Type Classification
enum TableType:
MANAGED // Catalog owns data lifecycle
EXTERNAL // Data managed externally
VIEW // Virtual table defined by query
MATERIALIZED // Physical snapshot of view
FORMAT_TABLE // External format (Parquet, ORC, etc.)
structure CatalogTableMetadata:
identifier: Identifier
tableType: TableType
schema: Schema
partitionKeys: list<string>
properties: map<string, string>
comment: string
createTime: Instant
lastModifiedTime: Instant
View Definition
structure View:
identifier: Identifier
query: string // SQL query text
schema: ViewSchema // Result schema
dialect: string // SQL dialect (e.g., "flink", "spark")
comment: string
properties: map<string, string>
structure ViewSchema:
fields: list<Field>
function validateAgainstQuery(query) -> boolean:
// Verify schema matches query result
enum ViewChange:
SET_QUERY(newQuery)
SET_COMMENT(newComment)
SET_PROPERTY(key, value)
REMOVE_PROPERTY(key)
Function Definition
structure Function:
identifier: Identifier
className: string // Implementation class
language: string // "java", "python", "scala"
resources: list<string> // JAR files or dependencies
kind: FunctionKind // SCALAR, TABLE, AGGREGATE
enum FunctionKind:
SCALAR // Regular function
TABLE // Table-valued function
AGGREGATE // Aggregation function
structure FunctionDefinition:
function: Function
implementation: FunctionImpl
structure FunctionImpl:
method: string // Method name to invoke
parameters: list<DataType> // Parameter types
returnType: DataType // Return type
enum FunctionChange:
SET_CLASS(newClassName)
ADD_RESOURCE(resourcePath)
REMOVE_RESOURCE(resourcePath)
SET_PROPERTY(key, value)
Partition Representation
structure Partition:
identifier: Identifier // Table identifier
partitionSpec: map<string, string> // Partition key-value pairs
location: string // Storage path
recordCount: long
fileCount: int
totalSize: long
function toPartitionPath() -> string:
// Convert partition spec to path format
// Example: {year=2024, month=01} -> "year=2024/month=01"
return partitionSpec.entries()
.map(e => e.key + "=" + e.value)
.join("/")
structure PartitionStatistics:
partition: Partition
columnStatistics: map<string, ColumnStats>
lastAnalyzedTime: Instant
Catalog Operations Interface
interface Catalog:
// Database operations
function listDatabases() -> list<string>
function databaseExists(dbName) -> boolean
function createDatabase(dbName, properties)
function dropDatabase(dbName, cascade)
// Table operations
function listTables(database) -> list<string>
function tableExists(identifier) -> boolean
function getTable(identifier) -> Table
function createTable(identifier, schema, properties)
function alterTable(identifier, changes)
function dropTable(identifier)
// View operations
function listViews(database) -> list<string>
function getView(identifier) -> View
function createView(identifier, view)
function alterView(identifier, changes)
function dropView(identifier)
// Function operations
function listFunctions(database) -> list<string>
function getFunction(identifier) -> Function
function createFunction(identifier, function)
function dropFunction(identifier)
// Partition operations
function listPartitions(tableIdentifier) -> list<Partition>
function getPartition(tableIdentifier, partitionSpec) -> Partition
function createPartition(partition)
function dropPartition(tableIdentifier, partitionSpec)
Factory and Discovery Pattern
interface CatalogFactory:
function identifier() -> string
function createCatalog(context, options) -> Catalog
structure FactoryUtil:
function discoverFactory(identifier) -> CatalogFactory:
// Use service provider interface (SPI) to discover factories
factories = ServiceLoader.load(CatalogFactory.class)
for each factory in factories:
if factory.identifier() == identifier:
return factory
throw CatalogException("No factory found for: " + identifier)
structure CatalogEnvironment:
configuration: Configuration
classLoader: ClassLoader
temporaryObjects: map<Identifier, Object>
function resolve(identifier) -> Object:
// Check temporary objects first (session-scoped)
if temporaryObjects.contains(identifier):
return temporaryObjects.get(identifier)
// Fall back to catalog lookup
return catalog.getTable(identifier)
Instant and Timestamp Handling
structure Instant:
millisSinceEpoch: long
function now() -> Instant:
return Instant(currentTimeMillis())
function toISO8601() -> string:
return formatISO8601(millisSinceEpoch)
Exception Handling
class CatalogException extends Exception:
identifier: Identifier
operation: string
function createTableAlreadyExists(identifier):
return CatalogException(
identifier,
"CREATE TABLE",
"Table already exists: " + identifier
)
function createTableNotFound(identifier):
return CatalogException(
identifier,
"GET TABLE",
"Table not found: " + identifier
)
Related Pages
- Implementation:Apache_Paimon_Identifier
- Implementation:Apache_Paimon_Identifier_Python
- Implementation:Apache_Paimon_TableType
- Implementation:Apache_Paimon_CatalogTableType
- Implementation:Apache_Paimon_Instant
- Implementation:Apache_Paimon_LookupStrategy
- Implementation:Apache_Paimon_View
- Implementation:Apache_Paimon_ViewSchema
- Implementation:Apache_Paimon_ViewChange
- Implementation:Apache_Paimon_Function
- Implementation:Apache_Paimon_FunctionDefinition
- Implementation:Apache_Paimon_FunctionImpl
- Implementation:Apache_Paimon_FunctionChange
- Implementation:Apache_Paimon_Factory
- Implementation:Apache_Paimon_FactoryUtil
- Implementation:Apache_Paimon_Partition
- Implementation:Apache_Paimon_PartitionStatistics
- Implementation:Apache_Paimon_CatalogEnvironment
- Implementation:Apache_Paimon_CatalogException