Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Apache Paimon Python Core Runtime

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Data_Engineering
Last Updated 2026-02-08 00:00 GMT

Overview

Python 3.6+ runtime environment with PyArrow, Pandas, fastavro, and zstandard as core dependencies for the PyPaimon SDK.

Description

This environment defines the core Python runtime and mandatory package dependencies required to run any PyPaimon operation. The SDK supports Python 3.6 through 3.11, with different dependency version constraints per Python version. Key packages include PyArrow for columnar data handling, Pandas/Polars for DataFrame integration, fastavro for manifest file reading, and zstandard/cramjam for compression. Python 3.6 requires special compatibility patches (fastavro zstd block reader) applied at import time.

Usage

Use this environment for all PyPaimon workflows including table read/write, schema operations, catalog interactions, and data format handling. This is the mandatory base prerequisite for every Implementation in the Apache Paimon Python SDK.

System Requirements

Category Requirement Notes
OS Linux, macOS, Windows (WSL) Cross-platform Python package
Python >= 3.6, <= 3.11 Tested: 3.6, 3.7, 3.8, 3.9, 3.10, 3.11
Disk 500MB+ For package installation and temporary files

Dependencies

Python Packages (Python 3.6)

  • `pyarrow` >= 6, < 7
  • `pandas` >= 1.1, < 2
  • `polars` >= 0.9, < 1
  • `cachetools` >= 4.2, < 6
  • `dataclasses` >= 0.8
  • `fastavro` >= 1.4, < 2
  • `fsspec` >= 2021.10, < 2026
  • `ossfs` >= 2021.8
  • `packaging` >= 21, < 26
  • `pyroaring`
  • `readerwriterlock` >= 1, < 2
  • `zstandard` >= 0.19, < 1

Python Packages (Python 3.8+)

  • `pyarrow` >= 16, < 20
  • `pandas` >= 1.5, < 3 (for 3.9+), >= 1.3, < 3 (for 3.7-3.8)
  • `polars` >= 1, < 2
  • `cachetools` >= 5, < 6
  • `fastavro` >= 1.4, < 2
  • `fsspec` >= 2023, < 2026
  • `ossfs` >= 2023
  • `packaging` >= 21, < 26
  • `pylance` >= 0.20, < 1 (for 3.9+), >= 0.10, < 1 (for 3.8)
  • `pyroaring`
  • `readerwriterlock` >= 1, < 2
  • `zstandard` >= 0.19, < 1
  • `cramjam` >= 1.3.0, < 3

Credentials

No credentials required for the core runtime. Storage-specific credentials are defined in the Environment:Apache_Paimon_Cloud_Storage_Credentials environment.

Quick Install

# Install core package
pip install pypaimon

# Or install from source with all dependencies
pip install pyarrow>=16 pandas>=1.5 fastavro>=1.4 zstandard>=0.19 pyroaring packaging>=21 fsspec>=2023 polars>=1 readerwriterlock>=1 cachetools>=5 cramjam>=1.3.0

Code Evidence

Python version check and compatibility patch from `pypaimon/__init__.py:19-23`:

if sys.version_info[:2] == (3, 6):
    try:
        from pypaimon.manifest import fastavro_py36_compat  # noqa: F401
    except ImportError:
        pass

Python version requirement from `setup.py:90`:

python_requires=">=3.6",

PyArrow version detection from `pypaimon/filesystem/pyarrow_file_io.py:45-46`:

self._pyarrow_gte_7 = parse(pyarrow.__version__) >= parse("7.0.0")
self._pyarrow_gte_8 = parse(pyarrow.__version__) >= parse("8.0.0")

Python 3.6 ORC write limitation from `pypaimon/filesystem/local_file_io.py:320`:

if sys.version_info[:2] == (3, 6):
    orc.write_table(data, f, **kwargs)
else:
    orc.write_table(data, f, compression=compression, **kwargs)

Common Errors

Error Message Cause Solution
`ImportError: No module named 'dataclasses'` Python 3.6 missing backport `pip install dataclasses>=0.8`
`ImportError: cannot import name 'fastavro_py36_compat'` Python 3.6 zstd patch not loaded Install `zstandard>=0.19`; this is non-fatal (caught by try/except)
ORC write fails on Python 3.6 Compression parameter not supported Upgrade to Python 3.8+ for full ORC compression support
`ModuleNotFoundError: No module named 'cramjam'` Missing compression library on Python 3.7+ `pip install cramjam>=1.3.0`

Compatibility Notes

  • Python 3.6: Requires fastavro zstd compatibility patch. ORC writes do not support compression parameter. `dataclasses` backport required. PyArrow limited to 6.x series.
  • Python 3.7: Transitional version. Most features work but some optional dependencies (Ray, pylance) may have limited support.
  • Python 3.8+: Full feature support. PyArrow 16+ required for latest API features. `cramjam` required for additional compression codec support.
  • PyArrow < 7.0: OSS endpoint handling differs (bucket prepended to endpoint). Missing `force_virtual_addressing` parameter.
  • PyArrow < 8.0: S3 retry strategy (`AwsStandardS3RetryStrategy`) not available.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment