Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Apache Paimon SimpleStatsEvolution

From Leeroopedia


Knowledge Sources
Domains Schema Evolution, Statistics Management
Last Updated 2026-02-08 00:00 GMT

Overview

SimpleStatsEvolution is a converter that handles schema evolution for column statistics arrays by projecting and transforming statistics when table schemas change.

Description

The SimpleStatsEvolution class provides functionality to evolve column statistics (min/max values and null counts) when table schemas change over time. It handles projections and transformations needed to maintain statistics consistency across schema versions.

The class supports two types of mappings: dense field mappings for statistics stored with specific columns, and index mappings for general schema evolution. It caches computed index mappings for performance and provides thread-safe access to the cache.

When statistics are evolved, the converter projects min/max values using ProjectedRow and adjusts null counts appropriately. For missing fields after schema evolution, it fills in the row count as the null count. The implementation optimizes for empty statistics by reusing pre-allocated empty value structures.

Usage

Use SimpleStatsEvolution when implementing schema evolution support, reading data files with older schema versions, or maintaining statistics consistency during ALTER TABLE operations that modify column structure.

Code Reference

Source Location

Signature

class SimpleStatsEvolution:
    """Converter for array of SimpleColStats."""

    def __init__(
        self,
        data_fields: List[DataField],
        index_mapping: Optional[List[int]],
        cast_field_getters: Optional[List[Any]]
    ):
        """Initialize with field definitions and optional index mapping."""

    def evolution(
        self,
        stats: SimpleStats,
        row_count: Optional[int],
        stats_fields: Optional[List[str]]
    ) -> SimpleStats:
        """Evolve statistics with schema evolution mappings."""

    def _get_dense_index_mapping(self, dense_fields: List[str]) -> List[int]:
        """Get dense index mapping with caching."""

    def _project_row(self, row: Any, index_mapping: List[int]) -> Any:
        """Project row based on index mapping using ProjectedRow."""

    def _project_array(self, array: List[Any], index_mapping: List[int]) -> List[Any]:
        """Project array based on index mapping."""

    def _evolve_null_counts(
        self,
        null_counts: List[Any],
        index_mapping: List[int],
        not_found_value: int
    ) -> List[Any]:
        """Evolve null counts with schema evolution mapping."""

Import

from pypaimon.manifest.simple_stats_evolution import SimpleStatsEvolution

I/O Contract

Inputs

Name Type Required Description
data_fields List[DataField] Yes Target schema field definitions
index_mapping List[int] No Optional index mapping for schema evolution
cast_field_getters List[Any] No Optional field cast functions
stats SimpleStats Yes Statistics to evolve
row_count int No Row count for null count evolution
stats_fields List[str] No Optional dense field list

Outputs

Name Type Description
evolved_stats SimpleStats Evolved statistics matching target schema

Usage Examples

from pypaimon.manifest.simple_stats_evolution import SimpleStatsEvolution
from pypaimon.manifest.schema.simple_stats import SimpleStats

# Create evolution converter
data_fields = table.schema_manager.get_schema(schema_id).fields
index_mapping = [0, 1, -1, 2]  # -1 indicates new field
stats_evolution = SimpleStatsEvolution(
    data_fields=data_fields,
    index_mapping=index_mapping,
    cast_field_getters=None
)

# Evolve statistics from old schema to new schema
old_stats = SimpleStats(min_values=old_min, max_values=old_max, null_counts=old_nulls)
new_stats = stats_evolution.evolution(
    stats=old_stats,
    row_count=1000,
    stats_fields=None
)

# Handle dense field statistics
dense_stats = stats_evolution.evolution(
    stats=file_stats,
    row_count=file.row_count,
    stats_fields=["col1", "col2", "col3"]
)

print(f"Evolved min values: {new_stats.min_values}")
print(f"Evolved max values: {new_stats.max_values}")
print(f"Evolved null counts: {new_stats.null_counts}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment