Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon FormatTable

From Leeroopedia


Knowledge Sources
Domains Format Tables, Table Abstraction
Last Updated 2026-02-08 00:00 GMT

Overview

FormatTable represents a table stored in a specific file format (Parquet, ORC, CSV, JSON, Text) without Paimon's LSM-tree structure.

Description

The FormatTable class provides a table abstraction for working with data stored in standard file formats. Unlike FileStoreTables that use Paimon's snapshot-based LSM-tree architecture, FormatTables directly read and write files in their native formats with optional partitioning support.

The class implements the Table interface and supports multiple file formats defined by the Format enum: ORC, PARQUET, CSV, TEXT, and JSON. It maintains table schema, file format, location, and configuration options while providing access to field definitions, partition keys, and table metadata.

FormatTable has no primary keys (as it doesn't support key-based operations) but supports partition keys for organizing data. It provides factory methods for creating read and batch write builders, but does not support stream writes.

Usage

Use FormatTable when working with external data in standard formats, importing/exporting data to/from Paimon, or when you need simple file-based storage without versioning or merge capabilities.

Code Reference

Source Location

Signature

class Format(str, Enum):
    """Supported file formats."""
    ORC = "orc"
    PARQUET = "parquet"
    CSV = "csv"
    TEXT = "text"
    JSON = "json"

    @classmethod
    def parse(cls, file_format: str) -> "Format":
        """Parse file format string."""


class FormatTable(Table):
    """Table stored in a specific file format."""

    def __init__(
        self,
        file_io: FileIO,
        identifier: Identifier,
        table_schema: TableSchema,
        location: str,
        format: Format,
        options: Optional[Dict[str, str]] = None,
        comment: Optional[str] = None,
    ):
        """Initialize with table metadata and format."""

    def name(self) -> str:
        """Get table name."""

    def full_name(self) -> str:
        """Get full table name (database.table)."""

    def location(self) -> str:
        """Get table location."""

    def format(self) -> Format:
        """Get file format."""

    def options(self) -> Dict[str, str]:
        """Get table options."""

    def new_read_builder(self):
        """Create a new read builder."""

    def new_batch_write_builder(self):
        """Create a new batch write builder."""

    def new_stream_write_builder(self):
        """Raise NotImplementedError - stream write not supported."""

Import

from pypaimon.table.format.format_table import FormatTable, Format

I/O Contract

Inputs

Name Type Required Description
file_io FileIO Yes File I/O handler
identifier Identifier Yes Table identifier
table_schema TableSchema Yes Table schema definition
location str Yes Root location for table data
format Format Yes File format (Parquet, ORC, CSV, JSON, Text)
options Dict[str, str] No Table configuration options
comment str No Table comment/description

Outputs

Name Type Description
table FormatTable Format table instance
name str Table name
location str Table root location
format Format File format enum value

Usage Examples

from pypaimon.table.format.format_table import FormatTable, Format
from pypaimon.schema.table_schema import TableSchema
from pypaimon.schema.data_types import DataField, AtomicType

# Create schema
schema = TableSchema(
    version=3,
    id=1,
    fields=[
        DataField(0, "id", AtomicType("BIGINT")),
        DataField(1, "name", AtomicType("STRING")),
        DataField(2, "date", AtomicType("STRING"))
    ],
    highest_field_id=2,
    partition_keys=["date"],
    primary_keys=[],
    options={}
)

# Create Parquet format table
table = FormatTable(
    file_io=file_io,
    identifier=Identifier.create("my_db", "my_table"),
    table_schema=schema,
    location="/path/to/data",
    format=Format.PARQUET,
    options={"format-table.partition-path-only-value": "false"}
)

# Read from format table
read_builder = table.new_read_builder()
read_builder = read_builder.with_projection(["id", "name"])
scan = read_builder.new_scan()
read = read_builder.new_read()

splits = scan.plan().splits()
df = read.to_pandas(splits)

# Write to format table
write_builder = table.new_batch_write_builder()
writer = write_builder.new_write()
writer.write_pandas(df)
commit_messages = writer.prepare_commit()

# Parse format from string
fmt = Format.parse("orc")  # Returns Format.ORC

print(f"Table: {table.full_name()}")
print(f"Location: {table.location()}")
print(f"Format: {table.format().value}")
print(f"Partition keys: {table.partition_keys}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment