Implementation:Apache Paimon BatchTableWrite Write Pandas
| Knowledge Sources | |
|---|---|
| Domains | Data_Lake, Columnar_Storage |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for writing pandas DataFrames to Paimon tables with multi-batch accumulation.
Description
TableWrite.write_pandas() accepts a pandas DataFrame, converts it to a PyArrow RecordBatch using the table schema, and writes it to the file store. Multiple write_pandas() calls can be made before a single prepare_commit(). The write_pandas method handles schema conversion via PyarrowFieldParser.from_paimon_schema() to ensure type consistency.
The write pipeline operates as follows:
- The pandas DataFrame is converted to a PyArrow RecordBatch using the table's schema for type consistency
- The RecordBatch is passed to the underlying file store writer, which produces data files in the configured format (Lance, Parquet, etc.)
- Multiple writes accumulate data files without committing
- prepare_commit() finalizes all pending data files and returns CommitMessage objects
- commit() atomically publishes all changes as a new snapshot
Usage
Use this implementation for writing pandas DataFrames to Paimon tables, especially when multiple batches need to be accumulated before a single atomic commit.
Code Reference
Source Location
- Repository: Apache Paimon
- File: paimon-python/pypaimon/write/table_write.py:L60-63
Signature
class TableWrite:
def write_pandas(self, dataframe) -> None:
class BatchTableWrite(TableWrite):
def prepare_commit(self) -> List[CommitMessage]:
Import
from pypaimon.write.table_write import BatchTableWrite
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| dataframe | pandas.DataFrame | Yes | The pandas DataFrame containing data to write to the table |
Outputs
| Name | Type | Description |
|---|---|---|
| (write_pandas) | None | Data is buffered internally; use prepare_commit() + commit() to finalize |
| (prepare_commit) | List[CommitMessage] | List of commit messages describing the buffered changes, passed to commit() |
Usage Examples
Basic Usage
import pandas as pd
write_builder = table.new_batch_write_builder()
writer = write_builder.new_write()
commit = write_builder.new_commit()
# Write multiple batches
for i in range(3):
df = pd.DataFrame({
'id': range(i*100, (i+1)*100),
'name': [f'item_{j}' for j in range(i*100, (i+1)*100)],
'value': [float(j) * 1.5 for j in range(i*100, (i+1)*100)],
})
writer.write_pandas(df)
# Atomic commit of all batches
commit_messages = writer.prepare_commit()
commit.commit(commit_messages)