Implementation:Apache Paimon Range
| Knowledge Sources | |
|---|---|
| Domains | Data Structures, Global Index |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Range represents an inclusive integer interval with utilities for intersection, exclusion, and merging operations.
Description
Range is a dataclass that represents an inclusive integer interval [from_, to]. It provides a comprehensive set of operations for working with ranges, including overlap detection, intersection calculation, exclusion of subranges, and merging of overlapping or adjacent ranges. The class is designed to support row ID range calculations in the global index system.
The core operations include contains (checking if a value is within the range), overlaps (checking if two ranges intersect), and exclude (removing one or more ranges from the current range, potentially splitting it into multiple non-overlapping pieces). The exclude operation is particularly useful for handling deletion vectors where certain row ID ranges need to be filtered out.
The class provides several static methods for working with lists of ranges. The and_ method computes the intersection of two range lists using a two-pointer algorithm. The merge_sorted_as_possible and sort_and_merge_overlap methods consolidate overlapping or adjacent ranges into minimal representations, useful for optimizing range queries.
All range operations assume integer positions and work correctly with both small and large values. The class enforces the invariant that from_ <= to through validation in __post_init__. This makes it safe to use for row ID ranges that may span billions of rows in large tables.
Usage
Range is used throughout the global index implementation to represent row ID ranges for files, track which rows to scan, and compute intersections when applying predicates or deletion vectors. It is particularly important in the SlicedSplit implementation for calculating exact row ranges to read.
Code Reference
Source Location
- Repository: Apache_Paimon
- File: paimon-python/pypaimon/globalindex/range.py
Signature
@dataclass
class Range:
"""Represents a range [from_, to] inclusive."""
from_: int
to: int
def __post_init__(self):
"""Validate that from_ <= to."""
...
def contains(self, value: int) -> bool:
"""Check if the range contains the given value."""
...
def overlaps(self, other: 'Range') -> bool:
"""Check if this range overlaps with another range."""
...
def exclude(self, ranges: List['Range']) -> List['Range']:
"""
Exclude the given ranges from this range.
Returns a list of non-overlapping ranges.
"""
...
@staticmethod
def intersect(start1: int, end1: int, start2: int, end2: int) -> bool:
"""Check if two ranges intersect."""
...
def count(self) -> int:
"""Return the number of elements in the range."""
...
@staticmethod
def and_(ranges1: List['Range'], ranges2: List['Range']) -> List['Range']:
"""Compute the intersection of two lists of ranges."""
...
@staticmethod
def merge_sorted_as_possible(ranges: List['Range']) -> List['Range']:
"""Merge sorted ranges where possible (adjacent or overlapping)."""
...
@staticmethod
def sort_and_merge_overlap(ranges: List['Range'], merge: bool = True,
adjacent: bool = True) -> List['Range']:
"""Sort ranges and optionally merge overlapping ones."""
...
Import
from pypaimon.globalindex.range import Range
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| from_ | int | Yes | Start of range (inclusive) |
| to | int | Yes | End of range (inclusive) |
| ranges | List[Range] | No | List of ranges for operations |
Outputs
| Name | Type | Description |
|---|---|---|
| Range | Range | New range instance |
| contains | bool | Whether value is in range |
| overlaps | bool | Whether ranges overlap |
| excluded | List[Range] | Ranges after exclusion |
| intersection | List[Range] | Intersection of range lists |
| merged | List[Range] | Merged ranges |
Usage Examples
from pypaimon.globalindex.range import Range
# Create range
r1 = Range(10, 20)
print(f"Range: [{r1.from_}, {r1.to}]") # [10, 20]
print(f"Count: {r1.count()}") # 11
# Containment check
print(r1.contains(15)) # True
print(r1.contains(25)) # False
# Overlap check
r2 = Range(15, 25)
print(r1.overlaps(r2)) # True
# Exclude ranges
r3 = Range(0, 100)
excluded = r3.exclude([Range(20, 30), Range(50, 60)])
print(excluded) # [Range(0, 19), Range(31, 49), Range(61, 100)]
# Intersection of range lists
file_ranges = [Range(0, 100), Range(200, 300)]
row_ranges = [Range(50, 150), Range(250, 350)]
intersection = Range.and_(file_ranges, row_ranges)
print(intersection) # [Range(50, 100), Range(250, 300)]
# Merge overlapping ranges
ranges = [Range(0, 10), Range(5, 15), Range(20, 30), Range(29, 40)]
merged = Range.merge_sorted_as_possible(ranges)
print(merged) # [Range(0, 15), Range(20, 40)]
# Practical example: calculate rows to read with deletion vectors
file_range = Range(1000, 2000) # File contains rows 1000-2000
deleted_ranges = [Range(1100, 1200), Range(1500, 1600)]
rows_to_read = file_range.exclude(deleted_ranges)
print(f"Read ranges: {rows_to_read}")
# [Range(1000, 1099), Range(1201, 1499), Range(1601, 2000)]
total_rows = sum(r.count() for r in rows_to_read)
print(f"Total rows to read: {total_rows}") # 799 rows