Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Apache Paimon Range

From Leeroopedia


Knowledge Sources
Domains Data Structures, Global Index
Last Updated 2026-02-08 00:00 GMT

Overview

Range represents an inclusive integer interval with utilities for intersection, exclusion, and merging operations.

Description

Range is a dataclass that represents an inclusive integer interval [from_, to]. It provides a comprehensive set of operations for working with ranges, including overlap detection, intersection calculation, exclusion of subranges, and merging of overlapping or adjacent ranges. The class is designed to support row ID range calculations in the global index system.

The core operations include contains (checking if a value is within the range), overlaps (checking if two ranges intersect), and exclude (removing one or more ranges from the current range, potentially splitting it into multiple non-overlapping pieces). The exclude operation is particularly useful for handling deletion vectors where certain row ID ranges need to be filtered out.

The class provides several static methods for working with lists of ranges. The and_ method computes the intersection of two range lists using a two-pointer algorithm. The merge_sorted_as_possible and sort_and_merge_overlap methods consolidate overlapping or adjacent ranges into minimal representations, useful for optimizing range queries.

All range operations assume integer positions and work correctly with both small and large values. The class enforces the invariant that from_ <= to through validation in __post_init__. This makes it safe to use for row ID ranges that may span billions of rows in large tables.

Usage

Range is used throughout the global index implementation to represent row ID ranges for files, track which rows to scan, and compute intersections when applying predicates or deletion vectors. It is particularly important in the SlicedSplit implementation for calculating exact row ranges to read.

Code Reference

Source Location

Signature

@dataclass
class Range:
    """Represents a range [from_, to] inclusive."""

    from_: int
    to: int

    def __post_init__(self):
        """Validate that from_ <= to."""
        ...

    def contains(self, value: int) -> bool:
        """Check if the range contains the given value."""
        ...

    def overlaps(self, other: 'Range') -> bool:
        """Check if this range overlaps with another range."""
        ...

    def exclude(self, ranges: List['Range']) -> List['Range']:
        """
        Exclude the given ranges from this range.
        Returns a list of non-overlapping ranges.
        """
        ...

    @staticmethod
    def intersect(start1: int, end1: int, start2: int, end2: int) -> bool:
        """Check if two ranges intersect."""
        ...

    def count(self) -> int:
        """Return the number of elements in the range."""
        ...

    @staticmethod
    def and_(ranges1: List['Range'], ranges2: List['Range']) -> List['Range']:
        """Compute the intersection of two lists of ranges."""
        ...

    @staticmethod
    def merge_sorted_as_possible(ranges: List['Range']) -> List['Range']:
        """Merge sorted ranges where possible (adjacent or overlapping)."""
        ...

    @staticmethod
    def sort_and_merge_overlap(ranges: List['Range'], merge: bool = True,
                               adjacent: bool = True) -> List['Range']:
        """Sort ranges and optionally merge overlapping ones."""
        ...

Import

from pypaimon.globalindex.range import Range

I/O Contract

Inputs

Name Type Required Description
from_ int Yes Start of range (inclusive)
to int Yes End of range (inclusive)
ranges List[Range] No List of ranges for operations

Outputs

Name Type Description
Range Range New range instance
contains bool Whether value is in range
overlaps bool Whether ranges overlap
excluded List[Range] Ranges after exclusion
intersection List[Range] Intersection of range lists
merged List[Range] Merged ranges

Usage Examples

from pypaimon.globalindex.range import Range

# Create range
r1 = Range(10, 20)
print(f"Range: [{r1.from_}, {r1.to}]")  # [10, 20]
print(f"Count: {r1.count()}")  # 11

# Containment check
print(r1.contains(15))  # True
print(r1.contains(25))  # False

# Overlap check
r2 = Range(15, 25)
print(r1.overlaps(r2))  # True

# Exclude ranges
r3 = Range(0, 100)
excluded = r3.exclude([Range(20, 30), Range(50, 60)])
print(excluded)  # [Range(0, 19), Range(31, 49), Range(61, 100)]

# Intersection of range lists
file_ranges = [Range(0, 100), Range(200, 300)]
row_ranges = [Range(50, 150), Range(250, 350)]
intersection = Range.and_(file_ranges, row_ranges)
print(intersection)  # [Range(50, 100), Range(250, 300)]

# Merge overlapping ranges
ranges = [Range(0, 10), Range(5, 15), Range(20, 30), Range(29, 40)]
merged = Range.merge_sorted_as_possible(ranges)
print(merged)  # [Range(0, 15), Range(20, 40)]

# Practical example: calculate rows to read with deletion vectors
file_range = Range(1000, 2000)  # File contains rows 1000-2000
deleted_ranges = [Range(1100, 1200), Range(1500, 1600)]
rows_to_read = file_range.exclude(deleted_ranges)
print(f"Read ranges: {rows_to_read}")
# [Range(1000, 1099), Range(1201, 1499), Range(1601, 2000)]

total_rows = sum(r.count() for r in rows_to_read)
print(f"Total rows to read: {total_rows}")  # 799 rows

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment