Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon Schema With Blob Options

From Leeroopedia


Knowledge Sources
Domains Data_Lake, Blob_Storage
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for creating Paimon table schemas with blob descriptor mode enabled.

Description

Schema.from_pyarrow_schema() validates blob-specific requirements when a large_binary() column is detected in the provided PyArrow schema. It enforces the following constraints:

  • The required options row-tracking.enabled=true and data-evolution.enabled=true must be present.
  • The blob-field option must specify a valid column name matching the large_binary() column.
  • The blob-as-descriptor option must be set to true.
  • Primary keys must be None -- primary keys are not allowed with blob tables.

If any of these constraints are violated, the method raises a validation error at schema creation time, preventing misconfigured blob tables from being created. This validation occurs within the from_pyarrow_schema() class method, which converts a PyArrow schema into a Paimon Schema object.

Usage

Use this method as the first step when creating a blob-enabled Paimon table. It is called once during table creation and produces a validated Schema object that can be passed to catalog.create_table().

Code Reference

Source Location

  • Repository: Apache Paimon
  • File: paimon-python/pypaimon/schema/schema.py:L51-88

Signature

Schema.from_pyarrow_schema(
    pa_schema: pa.Schema,
    partition_keys: Optional[List[str]] = None,
    primary_keys: Optional[List[str]] = None,  # Must be None for blob tables
    options: Optional[Dict] = None,
    comment: Optional[str] = None,
) -> Schema

Import

from pypaimon.schema.schema import Schema

I/O Contract

Inputs

Name Type Required Description
pa_schema pa.Schema Yes PyArrow schema that must include a pa.large_binary() column for blob storage
partition_keys Optional[List[str]] No Optional list of partition key column names
primary_keys Optional[List[str]] No Must be None for blob tables -- primary keys are not allowed with blob columns
options Dict Yes Must include blob-field, blob-as-descriptor: true, row-tracking.enabled: true, data-evolution.enabled: true
comment Optional[str] No Optional table comment

Outputs

Name Type Description
schema Schema A validated Paimon Schema object with blob descriptor mode enabled, ready for use with catalog.create_table()

Usage Examples

Basic Usage

import pyarrow as pa
from pypaimon.schema.schema import Schema

pa_schema = pa.schema([
    ('id', pa.int64()),
    ('filename', pa.string()),
    ('content_type', pa.string()),
    ('data', pa.large_binary()),
])

schema = Schema.from_pyarrow_schema(
    pa_schema,
    options={
        'blob-field': 'data',
        'blob-as-descriptor': 'true',
        'row-tracking.enabled': 'true',
        'data-evolution.enabled': 'true',
    }
)

catalog.create_table('media.files', schema, ignore_if_exists=True)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment