Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon BatchTableWrite Write Pandas

From Leeroopedia


Knowledge Sources
Domains Data_Lake, Columnar_Storage
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for writing pandas DataFrames to Paimon tables with multi-batch accumulation.

Description

TableWrite.write_pandas() accepts a pandas DataFrame, converts it to a PyArrow RecordBatch using the table schema, and writes it to the file store. Multiple write_pandas() calls can be made before a single prepare_commit(). The write_pandas method handles schema conversion via PyarrowFieldParser.from_paimon_schema() to ensure type consistency.

The write pipeline operates as follows:

  1. The pandas DataFrame is converted to a PyArrow RecordBatch using the table's schema for type consistency
  2. The RecordBatch is passed to the underlying file store writer, which produces data files in the configured format (Lance, Parquet, etc.)
  3. Multiple writes accumulate data files without committing
  4. prepare_commit() finalizes all pending data files and returns CommitMessage objects
  5. commit() atomically publishes all changes as a new snapshot

Usage

Use this implementation for writing pandas DataFrames to Paimon tables, especially when multiple batches need to be accumulated before a single atomic commit.

Code Reference

Source Location

  • Repository: Apache Paimon
  • File: paimon-python/pypaimon/write/table_write.py:L60-63

Signature

class TableWrite:
    def write_pandas(self, dataframe) -> None:

class BatchTableWrite(TableWrite):
    def prepare_commit(self) -> List[CommitMessage]:

Import

from pypaimon.write.table_write import BatchTableWrite

I/O Contract

Inputs

Name Type Required Description
dataframe pandas.DataFrame Yes The pandas DataFrame containing data to write to the table

Outputs

Name Type Description
(write_pandas) None Data is buffered internally; use prepare_commit() + commit() to finalize
(prepare_commit) List[CommitMessage] List of commit messages describing the buffered changes, passed to commit()

Usage Examples

Basic Usage

import pandas as pd

write_builder = table.new_batch_write_builder()
writer = write_builder.new_write()
commit = write_builder.new_commit()

# Write multiple batches
for i in range(3):
    df = pd.DataFrame({
        'id': range(i*100, (i+1)*100),
        'name': [f'item_{j}' for j in range(i*100, (i+1)*100)],
        'value': [float(j) * 1.5 for j in range(i*100, (i+1)*100)],
    })
    writer.write_pandas(df)

# Atomic commit of all batches
commit_messages = writer.prepare_commit()
commit.commit(commit_messages)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment