Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon BatchTableWrite Write Arrow

From Leeroopedia


Knowledge Sources
Domains Data_Lake, Table_Format
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for writing PyArrow, RecordBatch, and pandas data to Paimon tables.

Description

The TableWrite class provides write_arrow(), write_arrow_batch(), and write_pandas() methods for ingesting data into Paimon tables. Data is validated against the table schema, then partitioned and bucketed using RowKeyExtractor before being written to the file store. BatchTableWrite extends TableWrite with one-time commit semantics, ensuring that each write session produces exactly one set of commit messages via prepare_commit().

Usage

Use this implementation after obtaining a table reference from the catalog. Create a BatchWriteBuilder from the table, then create a writer and call the appropriate write method for your data format. After all data is written, call prepare_commit() to generate the commit messages needed for the atomic commit step.

Code Reference

Source Location

  • Repository: Apache Paimon
  • File: paimon-python/pypaimon/write/table_write.py
  • Lines: L32-121

Signature

class TableWrite:
    def write_arrow(self, table: pa.Table):
    def write_arrow_batch(self, data: pa.RecordBatch):
    def write_pandas(self, dataframe):

class BatchTableWrite(TableWrite):
    def prepare_commit(self) -> List[CommitMessage]:

Import

from pypaimon.write.table_write import BatchTableWrite

I/O Contract

Inputs

Name Type Required Description
table pa.Table Yes (for write_arrow) PyArrow Table to write
data pa.RecordBatch Yes (for write_arrow_batch) Single RecordBatch to write
dataframe pandas.DataFrame Yes (for write_pandas) DataFrame that is auto-converted to RecordBatch before writing

Outputs

Name Type Description
write_arrow return None Data is buffered internally; no return value
write_arrow_batch return None Data is buffered internally; no return value
write_pandas return None Data is buffered internally; no return value
prepare_commit return List[CommitMessage] List of commit messages describing file changes, used by the atomic commit step

Usage Examples

Basic Usage

import pyarrow as pa

# Create write builder and writer
write_builder = table.new_batch_write_builder()
writer = write_builder.new_write()

# Write PyArrow data
data = pa.table({
    'id': [1, 2, 3],
    'name': ['a', 'b', 'c'],
    'value': [1.0, 2.0, 3.0],
})
writer.write_arrow(data)

# Prepare commit messages
commit_messages = writer.prepare_commit()

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment