Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon PartitionStatistics

From Leeroopedia


Knowledge Sources
Domains Data Management, Statistics
Last Updated 2026-02-08 00:00 GMT

Overview

The PartitionStatistics class captures statistical information about a partition, with support for negative values to indicate data removal.

Description

PartitionStatistics is a serializable class that maintains comprehensive statistics for a table partition in Apache Paimon. It tracks essential metrics including the number of records, total file size in bytes, file count, the timestamp of the last file creation, and the total number of buckets. The partition specification (spec) maps partition key names to their values.

An important characteristic of this class is that its statistical fields may contain negative values, which indicate that some data has been removed from the partition. This design allows for tracking incremental changes to partition statistics, making it useful for maintaining delta statistics or tracking changes over time.

The class is annotated for JSON serialization using Jackson, making it compatible with REST APIs and distributed systems. It serves as the base class for the more feature-rich Partition class, providing the core statistical functionality.

Usage

Use PartitionStatistics when you need to track or report basic statistical information about a partition without additional metadata like completion status or audit information. It's particularly useful for reporting partition metrics, calculating storage usage, and understanding partition data distribution.

Code Reference

Source Location

Signature

@JsonIgnoreProperties(ignoreUnknown = true)
@Public
public class PartitionStatistics implements Serializable {
    private static final long serialVersionUID = 1L;

    @JsonProperty(FIELD_SPEC)
    protected final Map<String, String> spec;

    @JsonProperty(FIELD_RECORD_COUNT)
    protected final long recordCount;

    @JsonProperty(FIELD_FILE_SIZE_IN_BYTES)
    protected final long fileSizeInBytes;

    @JsonProperty(FIELD_FILE_COUNT)
    protected final long fileCount;

    @JsonProperty(FIELD_LAST_FILE_CREATION_TIME)
    protected final long lastFileCreationTime;

    @JsonProperty(FIELD_TOTAL_BUCKETS)
    protected final int totalBuckets;
}

Import

import org.apache.paimon.partition.PartitionStatistics;

I/O Contract

Inputs

Name Type Required Description
spec Map<String, String> Yes Partition specification mapping partition keys to values
recordCount long Yes Number of records (may be negative for removed data)
fileSizeInBytes long Yes Total file size in bytes (may be negative for removed data)
fileCount long Yes Number of files (may be negative for removed data)
lastFileCreationTime long Yes Timestamp of the last file creation
totalBuckets int Yes Total number of buckets (defaults to 0 if absent in older versions)

Outputs

Name Type Description
spec() Map<String, String> Returns the partition specification
recordCount() long Returns the number of records
fileSizeInBytes() long Returns the total file size in bytes
fileCount() long Returns the number of files
lastFileCreationTime() long Returns the last file creation timestamp
totalBuckets() int Returns the total number of buckets

Usage Examples

// Create partition statistics for a daily partition
Map<String, String> partitionSpec = new HashMap<>();
partitionSpec.put("year", "2024");
partitionSpec.put("month", "02");
partitionSpec.put("day", "08");

PartitionStatistics stats = new PartitionStatistics(
    partitionSpec,
    150000L,             // recordCount
    1024L * 1024 * 750,  // fileSizeInBytes (750 MB)
    15L,                 // fileCount
    System.currentTimeMillis(),  // lastFileCreationTime
    32                   // totalBuckets
);

// Access statistics
System.out.println("Partition spec: " + stats.spec());
System.out.println("Total records: " + stats.recordCount());
System.out.println("Storage size: " +
    stats.fileSizeInBytes() / (1024.0 * 1024.0) + " MB");
System.out.println("Number of files: " + stats.fileCount());
System.out.println("Bucket count: " + stats.totalBuckets());

// Example of negative statistics (data removal)
PartitionStatistics deltaStats = new PartitionStatistics(
    partitionSpec,
    -1000L,              // 1000 records removed
    -1024L * 1024 * 10,  // 10 MB removed
    -2L,                 // 2 files removed
    System.currentTimeMillis(),
    32
);

// Calculate combined statistics
long netRecords = stats.recordCount() + deltaStats.recordCount();
System.out.println("Net records after removal: " + netRecords);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment