Implementation:Apache Paimon TableRead To Arrow
| Knowledge Sources | |
|---|---|
| Domains | Data_Lake, Table_Format |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for converting Paimon table splits into PyArrow Tables, pandas DataFrames, and streaming readers.
Description
TableRead provides to_arrow(), to_pandas(), to_arrow_batch_reader(), and to_iterator() methods for materializing scan plan splits into usable data structures. It creates the appropriate SplitRead implementation (RawFileSplitRead, MergeFileSplitRead, or DataEvolutionSplitRead) based on table type and configuration. Each split is processed independently, with results concatenated into the final output. The to_arrow() method returns None if no data matches the scan plan.
Usage
Use this implementation after obtaining splits from TableScan.plan().splits(). Create a TableRead via read_builder.new_read(), then call the appropriate output method based on your downstream data processing needs.
Code Reference
Source Location
- Repository: Apache Paimon
- File: paimon-python/pypaimon/read/table_read.py
- Lines: L33-219
Signature
class TableRead:
def __init__(self, table, predicate: Optional[Predicate],
read_type: List[DataField]):
def to_arrow(self, splits: List[Split]) -> Optional[pyarrow.Table]:
def to_pandas(self, splits: List[Split]) -> pandas.DataFrame:
def to_arrow_batch_reader(self, splits: List[Split]) -> pyarrow.ipc.RecordBatchReader:
def to_iterator(self, splits: List[Split]) -> Iterator:
Import
from pypaimon.read.table_read import TableRead
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| splits | List[Split] | Yes | List of splits obtained from TableScan.plan().splits()
|
Outputs
| Name | Type | Description |
|---|---|---|
| to_arrow return | Optional[pyarrow.Table] | PyArrow Table containing all matching rows, or None if no data matches
|
| to_pandas return | pandas.DataFrame | pandas DataFrame containing all matching rows |
| to_arrow_batch_reader return | pyarrow.ipc.RecordBatchReader | Streaming reader that yields RecordBatches one at a time |
| to_iterator return | Iterator | Row-level iterator over matching data |
Usage Examples
Basic Usage
# After scan planning
read_builder = table.new_read_builder()
scan = read_builder.new_scan()
plan = scan.plan()
splits = plan.splits()
# Read as PyArrow Table
reader = read_builder.new_read()
arrow_table = reader.to_arrow(splits)
print(arrow_table.to_pandas())
# Or read as pandas directly
df = reader.to_pandas(splits)
# Or stream as RecordBatches
batch_reader = reader.to_arrow_batch_reader(splits)
for batch in iter(batch_reader.read_next_batch, None):
process(batch)