Implementation:Apache Paimon Schema With Lance Format
| Knowledge Sources | |
|---|---|
| Domains | Data_Lake, Columnar_Storage |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for creating Paimon tables with Lance columnar file format via Schema options.
Description
To create a Lance-format table, pass the FILE_FORMAT_LANCE option in the Schema's options dictionary. CoreOptions.FILE_FORMAT.key() returns file.format and CoreOptions.FILE_FORMAT_LANCE holds the value lance. The Schema.from_pyarrow_schema() accepts these options to produce a Lance-configured schema for table creation.
The schema creation process validates the provided PyArrow schema, converts field types to Paimon's internal type system, and stores the Lance format configuration in the table metadata. Once the schema is created with Lance format, all subsequent read and write operations on the table will use the Lance file reader and writer.
Usage
Use this implementation when initializing a new Paimon table that should store its data files in Lance format. This is the entry point for all Lance-based analytical workflows in Paimon.
Code Reference
Source Location
- Repository: Apache Paimon
- File: paimon-python/pypaimon/schema/schema.py:L51-88
- File: paimon-python/pypaimon/common/options/core_options.py:L56 (FILE_FORMAT_LANCE constant)
- File: paimon-python/pypaimon/common/options/core_options.py:L118-123 (FILE_FORMAT option)
Signature
from pypaimon.common.options.core_options import CoreOptions
# Key constants:
# CoreOptions.FILE_FORMAT.key() -> 'file.format'
# CoreOptions.FILE_FORMAT_LANCE -> 'lance'
Schema.from_pyarrow_schema(
pa_schema: pa.Schema,
partition_keys: Optional[List[str]] = None,
primary_keys: Optional[List[str]] = None,
options: Optional[Dict] = None, # Include {CoreOptions.FILE_FORMAT.key(): CoreOptions.FILE_FORMAT_LANCE}
comment: Optional[str] = None,
) -> Schema
Import
from pypaimon.schema.schema import Schema
from pypaimon.common.options.core_options import CoreOptions
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| pa_schema | pa.Schema | Yes | PyArrow schema defining the table columns and their types |
| partition_keys | Optional[List[str]] | No | List of column names to use as partition keys |
| primary_keys | Optional[List[str]] | No | List of column names to use as primary keys |
| options | Dict | Yes | Must include {file.format: lance} to enable Lance format |
| comment | Optional[str] | No | Optional comment describing the table |
Outputs
| Name | Type | Description |
|---|---|---|
| schema | Schema | A Schema instance with Lance format configured, ready for table creation |
Usage Examples
Basic Usage
import pyarrow as pa
from pypaimon.schema.schema import Schema
from pypaimon.common.options.core_options import CoreOptions
pa_schema = pa.schema([
('id', pa.int64()),
('name', pa.string()),
('value', pa.float64()),
('category', pa.string()),
])
schema = Schema.from_pyarrow_schema(
pa_schema,
options={CoreOptions.FILE_FORMAT.key(): CoreOptions.FILE_FORMAT_LANCE}
)
catalog.create_table('analytics.lance_table', schema, ignore_if_exists=True)