Implementation:Apache Paimon TableRead To Arrow

Knowledge Sources	Apache Paimon
Domains	Data_Lake, Table_Format
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for converting Paimon table splits into PyArrow Tables, pandas DataFrames, and streaming readers.

Description

TableRead provides to_arrow(), to_pandas(), to_arrow_batch_reader(), and to_iterator() methods for materializing scan plan splits into usable data structures. It creates the appropriate SplitRead implementation (RawFileSplitRead, MergeFileSplitRead, or DataEvolutionSplitRead) based on table type and configuration. Each split is processed independently, with results concatenated into the final output. The to_arrow() method returns None if no data matches the scan plan.

Usage

Use this implementation after obtaining splits from TableScan.plan().splits(). Create a TableRead via read_builder.new_read(), then call the appropriate output method based on your downstream data processing needs.

Code Reference

Source Location

Repository: Apache Paimon
File: paimon-python/pypaimon/read/table_read.py
Lines: L33-219

Signature

class TableRead:
    def __init__(self, table, predicate: Optional[Predicate],
                 read_type: List[DataField]):

    def to_arrow(self, splits: List[Split]) -> Optional[pyarrow.Table]:
    def to_pandas(self, splits: List[Split]) -> pandas.DataFrame:
    def to_arrow_batch_reader(self, splits: List[Split]) -> pyarrow.ipc.RecordBatchReader:
    def to_iterator(self, splits: List[Split]) -> Iterator:

Import

from pypaimon.read.table_read import TableRead

I/O Contract

Inputs

Name	Type	Required	Description
splits	List[Split]	Yes	List of splits obtained from `TableScan.plan().splits()`

Outputs

Name	Type	Description
to_arrow return	Optional[pyarrow.Table]	PyArrow Table containing all matching rows, or `None` if no data matches
to_pandas return	pandas.DataFrame	pandas DataFrame containing all matching rows
to_arrow_batch_reader return	pyarrow.ipc.RecordBatchReader	Streaming reader that yields RecordBatches one at a time
to_iterator return	Iterator	Row-level iterator over matching data

Usage Examples

Basic Usage

# After scan planning
read_builder = table.new_read_builder()
scan = read_builder.new_scan()
plan = scan.plan()
splits = plan.splits()

# Read as PyArrow Table
reader = read_builder.new_read()
arrow_table = reader.to_arrow(splits)
print(arrow_table.to_pandas())

# Or read as pandas directly
df = reader.to_pandas(splits)

# Or stream as RecordBatches
batch_reader = reader.to_arrow_batch_reader(splits)
for batch in iter(batch_reader.read_next_batch, None):
    process(batch)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment