Implementation:Apache Paimon BatchTableWrite Write Arrow
| Knowledge Sources | |
|---|---|
| Domains | Data_Lake, Table_Format |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for writing PyArrow, RecordBatch, and pandas data to Paimon tables.
Description
The TableWrite class provides write_arrow(), write_arrow_batch(), and write_pandas() methods for ingesting data into Paimon tables. Data is validated against the table schema, then partitioned and bucketed using RowKeyExtractor before being written to the file store. BatchTableWrite extends TableWrite with one-time commit semantics, ensuring that each write session produces exactly one set of commit messages via prepare_commit().
Usage
Use this implementation after obtaining a table reference from the catalog. Create a BatchWriteBuilder from the table, then create a writer and call the appropriate write method for your data format. After all data is written, call prepare_commit() to generate the commit messages needed for the atomic commit step.
Code Reference
Source Location
- Repository: Apache Paimon
- File: paimon-python/pypaimon/write/table_write.py
- Lines: L32-121
Signature
class TableWrite:
def write_arrow(self, table: pa.Table):
def write_arrow_batch(self, data: pa.RecordBatch):
def write_pandas(self, dataframe):
class BatchTableWrite(TableWrite):
def prepare_commit(self) -> List[CommitMessage]:
Import
from pypaimon.write.table_write import BatchTableWrite
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| table | pa.Table | Yes (for write_arrow) | PyArrow Table to write |
| data | pa.RecordBatch | Yes (for write_arrow_batch) | Single RecordBatch to write |
| dataframe | pandas.DataFrame | Yes (for write_pandas) | DataFrame that is auto-converted to RecordBatch before writing |
Outputs
| Name | Type | Description |
|---|---|---|
| write_arrow return | None | Data is buffered internally; no return value |
| write_arrow_batch return | None | Data is buffered internally; no return value |
| write_pandas return | None | Data is buffered internally; no return value |
| prepare_commit return | List[CommitMessage] | List of commit messages describing file changes, used by the atomic commit step |
Usage Examples
Basic Usage
import pyarrow as pa
# Create write builder and writer
write_builder = table.new_batch_write_builder()
writer = write_builder.new_write()
# Write PyArrow data
data = pa.table({
'id': [1, 2, 3],
'name': ['a', 'b', 'c'],
'value': [1.0, 2.0, 3.0],
})
writer.write_arrow(data)
# Prepare commit messages
commit_messages = writer.prepare_commit()