Implementation:Apache Flink Python Setup
| Knowledge Sources | |
|---|---|
| Domains | Python, Packaging |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
setup.py is the setuptools-based build and packaging script for the apache-flink Python package (PyFlink), handling the assembly of Java artifacts, Cython extensions, and Python source into a single installable distribution.
Description
This script orchestrates the complete packaging of the PyFlink distribution. It performs several key functions:
Python Version Check: The script enforces a minimum Python version of 3.9, exiting immediately if an older version is detected.
Cython Extension Building: On non-Windows platforms, the script attempts to build Cython extensions for performance-critical modules in the pyflink.fn_execution package. These include coders, streams, aggregation, window aggregation, and Apache Beam integration modules. If Cython is not available but pre-compiled C sources exist, it falls back to building from the C files. On Windows or when neither Cython nor C sources are available, no extensions are built.
Source Tree Detection: The script detects whether it is running within the Flink source tree by checking for the existence of StreamExecutionEnvironment.java in the parent directory. When running from source, it:
- Parses the parent pom.xml to extract the Flink version
- Copies configuration files from the Flink distribution assembly descriptor (bin.xml)
- Copies binary scripts and UDF runner scripts
- Creates symlinks (or copies as fallback) for examples, LICENSE, and README files
- Creates a temporary log directory
Version Management: The version is read dynamically from pyflink/version.py. For development versions (containing "dev"), the apache-flink-libraries dependency is pinned to the exact version. For release versions, it uses a range from the current version up to (but not including) the next patch version.
Package Assembly: The script assembles 27+ Python packages covering the core pyflink module, table API, datastream API, function execution, metrics, configuration, logging, examples, and testing utilities.
Dependencies: The install requirements include py4j (0.10.9.7), python-dateutil, apache-beam (2.54.0-2.61.0), cloudpickle, avro, pytz, fastavro, requests, protobuf, numpy, pandas (1.3.0-2.3), pyarrow (5.0.0-21.0.0), pemja (non-Windows only), httplib2, ruamel.yaml, and the apache-flink-libraries package.
Usage
This script is used to build and install the PyFlink package. It should be run from the flink-python directory, either within the Flink source tree (for development builds) or from an extracted source distribution (for release installations).
Code Reference
Source Location
- Repository: Apache_Flink
- File: flink-python/setup.py
- Lines: 1-367
Signature
# Module-level functions
def remove_if_exists(file_path)
def copy_files(src_paths, output_directory)
def has_unsupported_tag(file_element)
def extracted_output_files(base_dir, file_path, output_directory)
# Main entry point
setup(
name='apache-flink',
version=VERSION,
packages=PACKAGES,
...
)
Import
# This is a standalone setup script, not typically imported.
# It is invoked via:
# pip install .
# python setup.py install
# python setup.py bdist_wheel
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| pyflink/version.py | Python file | Yes | Defines the __version__ variable used for package versioning |
| README.md | Markdown file | Yes | Long description content for the package metadata |
| ../pom.xml | XML file | No (source tree only) | Parent Maven POM used to extract the Flink version |
| ../flink-dist/src/main/assemblies/bin.xml | XML file | No (source tree only) | Assembly descriptor defining which configuration and binary files to include |
| Cython .pyx files | Cython source | No | Performance-critical extension source files for fn_execution modules |
Outputs
| Name | Type | Description |
|---|---|---|
| apache-flink package | Python distribution | The installable PyFlink package containing all Python modules, configuration, scripts, and optional Cython extensions |
| deps/ | Temporary directory | Temporary directory created during source-tree builds containing conf, log, examples, and bin files; cleaned up after setup completes |
Usage Examples
Basic Usage
# Install PyFlink from the source tree
# (run from the flink-python directory)
# pip install .
# Build a wheel distribution
# python setup.py bdist_wheel
# Install in development mode
# pip install -e .
# The script auto-detects the Flink source tree and assembles
# all necessary artifacts. Key dependencies installed:
# - py4j==0.10.9.7
# - apache-beam>=2.54.0,<=2.61.0
# - pandas>=1.3.0,<2.3
# - pyarrow>=5.0.0,<21.0.0
# - numpy>=1.22.4
# - apache-flink-libraries (version-matched)