Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Apache Paimon Schema Evolution

From Leeroopedia
Revision as of 18:07, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Apache_Paimon_Schema_Evolution.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Schema, Evolution
Last Updated 2026-02-08 00:00 GMT

Overview

Managing schema changes and evolution over time while maintaining backward compatibility through versioned schema tracking and explicit change operations.

Description

Schema evolution addresses the fundamental challenge of modifying data structure definitions in long-lived systems where historical data exists under older schemas. As business requirements change, tables need to add new columns, rename existing fields, change data types, or restructure nested objects. The schema evolution principle provides a framework for expressing these changes explicitly while ensuring that existing data remains readable and queries continue to function correctly.

Each schema version is immutably stored with a unique version identifier, creating a complete history of schema changes over the table's lifetime. When a schema change is requested, the system validates that the change is compatible with existing data and doesn't violate constraints like primary keys or partitioning columns. Compatible changes include adding nullable columns, widening numeric types, or renaming fields with proper metadata updates. Incompatible changes like dropping non-nullable columns or narrowing types require explicit data migration or are rejected.

The schema manager coordinates between the abstract schema definition and concrete implementations in storage and query engines. When reading data written under an older schema, the system applies schema projection to map old field positions to new ones, fills default values for added columns, and handles renamed fields transparently. Statistics collected under old schemas are evolved forward to match new schemas, enabling query optimizers to make informed decisions even when data spans multiple schema versions. This approach allows schemas to evolve gradually while maintaining continuous read and write access to the table.

Usage

Apply this principle when building systems that manage long-lived datasets where schema requirements change over time but backward compatibility with existing data is essential. Use explicit schema versioning when you need to audit when and how schemas changed, or when different parts of the system may temporarily operate with different schema versions during rolling upgrades.

Theoretical Basis

Schema evolution implements a versioned schema store with explicit change operations:

Schema Versioning:

  • Each schema has unique identifier: schema_id (monotonically increasing)
  • Schema contains: list of fields, primary keys, partition keys, options, timestamp
  • Historical schemas stored immutably: schema/schema-0, schema/schema-1, ...
  • Current schema pointer identifies latest version

Schema Change Operations:

  • ADD_COLUMN(name, type, description, position): Insert new field
  • DROP_COLUMN(name): Remove existing field (validate no active readers)
  • RENAME_COLUMN(oldName, newName): Update field name, preserve field ID
  • UPDATE_COLUMN_TYPE(name, newType): Widen or narrow type (validate compatibility)
  • UPDATE_COLUMN_NULLABLE(name, nullable): Change nullability constraint
  • UPDATE_COLUMN_COMMENT(name, comment): Modify field documentation
  • UPDATE_COLUMN_POSITION(name, newPosition): Reorder fields

Compatibility Validation:

  • Forward compatibility: New schema can read data written with old schema
  • Backward compatibility: Old schema can read data written with new schema
  • Full compatibility: Both forward and backward compatible

Schema Projection Algorithm: ``` function projectRecord(record, oldSchema, newSchema):

 result = emptyRecord()
 for each field in newSchema:
   if field exists in oldSchema:
     result[field.name] = record[oldSchema.fieldPosition(field.name)]
   else:
     result[field.name] = field.defaultValue()
 return result

```

Statistics Evolution: When statistics (min/max/null_count) exist for old schema:

  • Propagate statistics for unchanged columns
  • Generate default statistics for new columns (null_count = row_count if nullable)
  • Invalidate statistics for dropped or transformed columns

This approach ensures that schema changes are tracked explicitly and applied consistently across all components that interact with the table data.

Related Pages

Implementation:Apache_Paimon_SchemaChange Implementation:Apache_Paimon_SchemaChange_Python Implementation:Apache_Paimon_Schema Implementation:Apache_Paimon_TableSchema Implementation:Apache_Paimon_TableSchema_Python Implementation:Apache_Paimon_SchemaSerializer Implementation:Apache_Paimon_SchemaManager Implementation:Apache_Paimon_SimpleStatsEvolution

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment