Implementation:Apache Paimon Schema With Blob Options

Knowledge Sources	Apache Paimon
Domains	Data_Lake, Blob_Storage
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for creating Paimon table schemas with blob descriptor mode enabled.

Description

Schema.from_pyarrow_schema() validates blob-specific requirements when a large_binary() column is detected in the provided PyArrow schema. It enforces the following constraints:

The required options row-tracking.enabled=true and data-evolution.enabled=true must be present.
The blob-field option must specify a valid column name matching the large_binary() column.
The blob-as-descriptor option must be set to true.
Primary keys must be None -- primary keys are not allowed with blob tables.

If any of these constraints are violated, the method raises a validation error at schema creation time, preventing misconfigured blob tables from being created. This validation occurs within the from_pyarrow_schema() class method, which converts a PyArrow schema into a Paimon Schema object.

Usage

Use this method as the first step when creating a blob-enabled Paimon table. It is called once during table creation and produces a validated Schema object that can be passed to catalog.create_table().

Code Reference

Source Location

Repository: Apache Paimon
File: paimon-python/pypaimon/schema/schema.py:L51-88

Signature

Schema.from_pyarrow_schema(
    pa_schema: pa.Schema,
    partition_keys: Optional[List[str]] = None,
    primary_keys: Optional[List[str]] = None,  # Must be None for blob tables
    options: Optional[Dict] = None,
    comment: Optional[str] = None,
) -> Schema

Import

from pypaimon.schema.schema import Schema

I/O Contract

Inputs

Name	Type	Required	Description
pa_schema	pa.Schema	Yes	PyArrow schema that must include a pa.large_binary() column for blob storage
partition_keys	Optional[List[str]]	No	Optional list of partition key column names
primary_keys	Optional[List[str]]	No	Must be None for blob tables -- primary keys are not allowed with blob columns
options	Dict	Yes	Must include blob-field, blob-as-descriptor: true, row-tracking.enabled: true, data-evolution.enabled: true
comment	Optional[str]	No	Optional table comment

Outputs

Name	Type	Description
schema	Schema	A validated Paimon Schema object with blob descriptor mode enabled, ready for use with catalog.create_table()

Usage Examples

Basic Usage

import pyarrow as pa
from pypaimon.schema.schema import Schema

pa_schema = pa.schema([
    ('id', pa.int64()),
    ('filename', pa.string()),
    ('content_type', pa.string()),
    ('data', pa.large_binary()),
])

schema = Schema.from_pyarrow_schema(
    pa_schema,
    options={
        'blob-field': 'data',
        'blob-as-descriptor': 'true',
        'row-tracking.enabled': 'true',
        'data-evolution.enabled': 'true',
    }
)

catalog.create_table('media.files', schema, ignore_if_exists=True)

Related Pages

Implements Principle

Principle:Apache_Paimon_Blob_Schema_Definition

Requires Environment

Environment:Apache_Paimon_Python_Core_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment