Implementation:Apache Paimon FormatTable

Knowledge Sources	Apache_Paimon
Domains	Format Tables, Table Abstraction
Last Updated	2026-02-08 00:00 GMT

Overview

FormatTable represents a table stored in a specific file format (Parquet, ORC, CSV, JSON, Text) without Paimon's LSM-tree structure.

Description

The FormatTable class provides a table abstraction for working with data stored in standard file formats. Unlike FileStoreTables that use Paimon's snapshot-based LSM-tree architecture, FormatTables directly read and write files in their native formats with optional partitioning support.

The class implements the Table interface and supports multiple file formats defined by the Format enum: ORC, PARQUET, CSV, TEXT, and JSON. It maintains table schema, file format, location, and configuration options while providing access to field definitions, partition keys, and table metadata.

FormatTable has no primary keys (as it doesn't support key-based operations) but supports partition keys for organizing data. It provides factory methods for creating read and batch write builders, but does not support stream writes.

Usage

Use FormatTable when working with external data in standard formats, importing/exporting data to/from Paimon, or when you need simple file-based storage without versioning or merge capabilities.

Code Reference

Source Location

Repository: Apache_Paimon
File: paimon-python/pypaimon/table/format/format_table.py

Signature

class Format(str, Enum):
    """Supported file formats."""
    ORC = "orc"
    PARQUET = "parquet"
    CSV = "csv"
    TEXT = "text"
    JSON = "json"

    @classmethod
    def parse(cls, file_format: str) -> "Format":
        """Parse file format string."""


class FormatTable(Table):
    """Table stored in a specific file format."""

    def __init__(
        self,
        file_io: FileIO,
        identifier: Identifier,
        table_schema: TableSchema,
        location: str,
        format: Format,
        options: Optional[Dict[str, str]] = None,
        comment: Optional[str] = None,
    ):
        """Initialize with table metadata and format."""

    def name(self) -> str:
        """Get table name."""

    def full_name(self) -> str:
        """Get full table name (database.table)."""

    def location(self) -> str:
        """Get table location."""

    def format(self) -> Format:
        """Get file format."""

    def options(self) -> Dict[str, str]:
        """Get table options."""

    def new_read_builder(self):
        """Create a new read builder."""

    def new_batch_write_builder(self):
        """Create a new batch write builder."""

    def new_stream_write_builder(self):
        """Raise NotImplementedError - stream write not supported."""

Import

from pypaimon.table.format.format_table import FormatTable, Format

I/O Contract

Inputs

Name	Type	Required	Description
file_io	FileIO	Yes	File I/O handler
identifier	Identifier	Yes	Table identifier
table_schema	TableSchema	Yes	Table schema definition
location	str	Yes	Root location for table data
format	Format	Yes	File format (Parquet, ORC, CSV, JSON, Text)
options	Dict[str, str]	No	Table configuration options
comment	str	No	Table comment/description

Outputs

Name	Type	Description
table	FormatTable	Format table instance
name	str	Table name
location	str	Table root location
format	Format	File format enum value

Usage Examples

from pypaimon.table.format.format_table import FormatTable, Format
from pypaimon.schema.table_schema import TableSchema
from pypaimon.schema.data_types import DataField, AtomicType

# Create schema
schema = TableSchema(
    version=3,
    id=1,
    fields=[
        DataField(0, "id", AtomicType("BIGINT")),
        DataField(1, "name", AtomicType("STRING")),
        DataField(2, "date", AtomicType("STRING"))
    ],
    highest_field_id=2,
    partition_keys=["date"],
    primary_keys=[],
    options={}
)

# Create Parquet format table
table = FormatTable(
    file_io=file_io,
    identifier=Identifier.create("my_db", "my_table"),
    table_schema=schema,
    location="/path/to/data",
    format=Format.PARQUET,
    options={"format-table.partition-path-only-value": "false"}
)

# Read from format table
read_builder = table.new_read_builder()
read_builder = read_builder.with_projection(["id", "name"])
scan = read_builder.new_scan()
read = read_builder.new_read()

splits = scan.plan().splits()
df = read.to_pandas(splits)

# Write to format table
write_builder = table.new_batch_write_builder()
writer = write_builder.new_write()
writer.write_pandas(df)
commit_messages = writer.prepare_commit()

# Parse format from string
fmt = Format.parse("orc")  # Returns Format.ORC

print(f"Table: {table.full_name()}")
print(f"Location: {table.location()}")
print(f"Format: {table.format().value}")
print(f"Partition keys: {table.partition_keys}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment