Implementation:Apache Paimon SimpleStatsEvolution
| Knowledge Sources | |
|---|---|
| Domains | Schema Evolution, Statistics Management |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
SimpleStatsEvolution is a converter that handles schema evolution for column statistics arrays by projecting and transforming statistics when table schemas change.
Description
The SimpleStatsEvolution class provides functionality to evolve column statistics (min/max values and null counts) when table schemas change over time. It handles projections and transformations needed to maintain statistics consistency across schema versions.
The class supports two types of mappings: dense field mappings for statistics stored with specific columns, and index mappings for general schema evolution. It caches computed index mappings for performance and provides thread-safe access to the cache.
When statistics are evolved, the converter projects min/max values using ProjectedRow and adjusts null counts appropriately. For missing fields after schema evolution, it fills in the row count as the null count. The implementation optimizes for empty statistics by reusing pre-allocated empty value structures.
Usage
Use SimpleStatsEvolution when implementing schema evolution support, reading data files with older schema versions, or maintaining statistics consistency during ALTER TABLE operations that modify column structure.
Code Reference
Source Location
- Repository: Apache_Paimon
- File: paimon-python/pypaimon/manifest/simple_stats_evolution.py
Signature
class SimpleStatsEvolution:
"""Converter for array of SimpleColStats."""
def __init__(
self,
data_fields: List[DataField],
index_mapping: Optional[List[int]],
cast_field_getters: Optional[List[Any]]
):
"""Initialize with field definitions and optional index mapping."""
def evolution(
self,
stats: SimpleStats,
row_count: Optional[int],
stats_fields: Optional[List[str]]
) -> SimpleStats:
"""Evolve statistics with schema evolution mappings."""
def _get_dense_index_mapping(self, dense_fields: List[str]) -> List[int]:
"""Get dense index mapping with caching."""
def _project_row(self, row: Any, index_mapping: List[int]) -> Any:
"""Project row based on index mapping using ProjectedRow."""
def _project_array(self, array: List[Any], index_mapping: List[int]) -> List[Any]:
"""Project array based on index mapping."""
def _evolve_null_counts(
self,
null_counts: List[Any],
index_mapping: List[int],
not_found_value: int
) -> List[Any]:
"""Evolve null counts with schema evolution mapping."""
Import
from pypaimon.manifest.simple_stats_evolution import SimpleStatsEvolution
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data_fields | List[DataField] | Yes | Target schema field definitions |
| index_mapping | List[int] | No | Optional index mapping for schema evolution |
| cast_field_getters | List[Any] | No | Optional field cast functions |
| stats | SimpleStats | Yes | Statistics to evolve |
| row_count | int | No | Row count for null count evolution |
| stats_fields | List[str] | No | Optional dense field list |
Outputs
| Name | Type | Description |
|---|---|---|
| evolved_stats | SimpleStats | Evolved statistics matching target schema |
Usage Examples
from pypaimon.manifest.simple_stats_evolution import SimpleStatsEvolution
from pypaimon.manifest.schema.simple_stats import SimpleStats
# Create evolution converter
data_fields = table.schema_manager.get_schema(schema_id).fields
index_mapping = [0, 1, -1, 2] # -1 indicates new field
stats_evolution = SimpleStatsEvolution(
data_fields=data_fields,
index_mapping=index_mapping,
cast_field_getters=None
)
# Evolve statistics from old schema to new schema
old_stats = SimpleStats(min_values=old_min, max_values=old_max, null_counts=old_nulls)
new_stats = stats_evolution.evolution(
stats=old_stats,
row_count=1000,
stats_fields=None
)
# Handle dense field statistics
dense_stats = stats_evolution.evolution(
stats=file_stats,
row_count=file.row_count,
stats_fields=["col1", "col2", "col3"]
)
print(f"Evolved min values: {new_stats.min_values}")
print(f"Evolved max values: {new_stats.max_values}")
print(f"Evolved null counts: {new_stats.null_counts}")