Implementation:Apache Paimon Schema With Blob Options
| Knowledge Sources | |
|---|---|
| Domains | Data_Lake, Blob_Storage |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for creating Paimon table schemas with blob descriptor mode enabled.
Description
Schema.from_pyarrow_schema() validates blob-specific requirements when a large_binary() column is detected in the provided PyArrow schema. It enforces the following constraints:
- The required options row-tracking.enabled=true and data-evolution.enabled=true must be present.
- The blob-field option must specify a valid column name matching the large_binary() column.
- The blob-as-descriptor option must be set to true.
- Primary keys must be None -- primary keys are not allowed with blob tables.
If any of these constraints are violated, the method raises a validation error at schema creation time, preventing misconfigured blob tables from being created. This validation occurs within the from_pyarrow_schema() class method, which converts a PyArrow schema into a Paimon Schema object.
Usage
Use this method as the first step when creating a blob-enabled Paimon table. It is called once during table creation and produces a validated Schema object that can be passed to catalog.create_table().
Code Reference
Source Location
- Repository: Apache Paimon
- File: paimon-python/pypaimon/schema/schema.py:L51-88
Signature
Schema.from_pyarrow_schema(
pa_schema: pa.Schema,
partition_keys: Optional[List[str]] = None,
primary_keys: Optional[List[str]] = None, # Must be None for blob tables
options: Optional[Dict] = None,
comment: Optional[str] = None,
) -> Schema
Import
from pypaimon.schema.schema import Schema
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| pa_schema | pa.Schema | Yes | PyArrow schema that must include a pa.large_binary() column for blob storage |
| partition_keys | Optional[List[str]] | No | Optional list of partition key column names |
| primary_keys | Optional[List[str]] | No | Must be None for blob tables -- primary keys are not allowed with blob columns |
| options | Dict | Yes | Must include blob-field, blob-as-descriptor: true, row-tracking.enabled: true, data-evolution.enabled: true |
| comment | Optional[str] | No | Optional table comment |
Outputs
| Name | Type | Description |
|---|---|---|
| schema | Schema | A validated Paimon Schema object with blob descriptor mode enabled, ready for use with catalog.create_table() |
Usage Examples
Basic Usage
import pyarrow as pa
from pypaimon.schema.schema import Schema
pa_schema = pa.schema([
('id', pa.int64()),
('filename', pa.string()),
('content_type', pa.string()),
('data', pa.large_binary()),
])
schema = Schema.from_pyarrow_schema(
pa_schema,
options={
'blob-field': 'data',
'blob-as-descriptor': 'true',
'row-tracking.enabled': 'true',
'data-evolution.enabled': 'true',
}
)
catalog.create_table('media.files', schema, ignore_if_exists=True)