Principle:Apache Paimon Type System
| Knowledge Sources | |
|---|---|
| Domains | Type_System, Schema |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
An extensible data type hierarchy that provides schema definition, type-safe operations, and serialization support for structured and semi-structured data.
Description
A robust type system forms the foundation of any data processing framework, defining how data is represented, validated, and transformed throughout the system. The type system establishes a hierarchy of data types ranging from primitives (integers, strings, timestamps) to complex nested structures (arrays, maps, rows) with explicit nullability semantics. Each type carries metadata about its structure and constraints, enabling compile-time and runtime validation of operations.
The type system supports schema evolution by allowing types to be compared for compatibility and providing rules for safe type conversions. Type families group related types together (numeric types, temporal types, collection types) enabling generic algorithms that operate on entire families rather than individual types. The visitor pattern enables extensible operations over the type hierarchy without modifying type definitions themselves, allowing new transformations or analyses to be added independently.
Row types represent structured records with named fields, where each field has its own type and nullability. This enables nested schemas of arbitrary depth while maintaining type safety at each level. The type system distinguishes between logical types (user-facing semantic types) and physical types (storage representation), allowing optimization of storage format while preserving logical semantics. Special types like decimals carry precision and scale parameters, while vector types support embeddings for machine learning applications.
Usage
Apply this principle when building systems that process diverse data formats and require strong type guarantees across serialization boundaries. Use an extensible type hierarchy when new types may be added over time, when schema evolution must be tracked explicitly, or when operations need to be generic across multiple related types. The visitor pattern is appropriate when type-specific logic must be implemented without modifying type definitions.
Theoretical Basis
The type system follows a hierarchical design with abstract type interfaces and concrete type implementations:
Type Hierarchy:
- Root type interface defines common operations: isNullable(), getTypeRoot(), accept(visitor)
- Atomic types: INTEGER, BIGINT, FLOAT, DOUBLE, BOOLEAN, STRING, BYTES, DATE, TIMESTAMP
- Parameterized types: DECIMAL(precision, scale), CHAR(length), VARCHAR(length)
- Collection types: ARRAY(elementType), MAP(keyType, valueType)
- Structured types: ROW(fields[name, type, description])
- Special types: VECTOR(dimension, elementType) for embeddings
Type Operations:
- Equality: Two types are equal if they have the same structure and parameters
- Compatibility: Type A is compatible with B if values of A can be safely read as B
- Casting: Explicit conversion between types with potential data loss
- Serialization: Convert type definition to portable format (JSON, binary)
Visitor Pattern for Type Operations: ``` interface TypeVisitor<R>:
method visit(IntegerType) -> R method visit(StringType) -> R method visit(ArrayType) -> R method visit(RowType) -> R ...
abstract class Type:
method accept(visitor: TypeVisitor<R>) -> R
```
This allows implementing new operations like type validation, schema inference, or format conversion without modifying type classes.
Row Kind Semantics: To support change data capture (CDC), each row carries a row kind marker:
- INSERT: New row added
- UPDATE_BEFORE: Old version of updated row
- UPDATE_AFTER: New version of updated row
- DELETE: Row removed
This enables the type system to represent both snapshot data and changelog streams uniformly.
Related Pages
Implementation:Apache_Paimon_DataType Implementation:Apache_Paimon_DataTypeRoot Implementation:Apache_Paimon_DataTypeFamily Implementation:Apache_Paimon_DataTypeVisitor Implementation:Apache_Paimon_DataTypeDefaultVisitor Implementation:Apache_Paimon_DataTypeChecks Implementation:Apache_Paimon_DataTypeCasts Implementation:Apache_Paimon_DataTypeJsonParser Implementation:Apache_Paimon_DataTypes_Java Implementation:Apache_Paimon_DataTypes_Python Implementation:Apache_Paimon_DataField Implementation:Apache_Paimon_RowType Implementation:Apache_Paimon_ArrayType Implementation:Apache_Paimon_MapType Implementation:Apache_Paimon_DecimalType Implementation:Apache_Paimon_VectorType Implementation:Apache_Paimon_RowKind Implementation:Apache_Paimon_RowKind_Python