Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Apache Flink Python Setup

From Leeroopedia


Knowledge Sources
Domains Python, Packaging
Last Updated 2026-02-09 00:00 GMT

Overview

setup.py is the setuptools-based build and packaging script for the apache-flink Python package (PyFlink), handling the assembly of Java artifacts, Cython extensions, and Python source into a single installable distribution.

Description

This script orchestrates the complete packaging of the PyFlink distribution. It performs several key functions:

Python Version Check: The script enforces a minimum Python version of 3.9, exiting immediately if an older version is detected.

Cython Extension Building: On non-Windows platforms, the script attempts to build Cython extensions for performance-critical modules in the pyflink.fn_execution package. These include coders, streams, aggregation, window aggregation, and Apache Beam integration modules. If Cython is not available but pre-compiled C sources exist, it falls back to building from the C files. On Windows or when neither Cython nor C sources are available, no extensions are built.

Source Tree Detection: The script detects whether it is running within the Flink source tree by checking for the existence of StreamExecutionEnvironment.java in the parent directory. When running from source, it:

  • Parses the parent pom.xml to extract the Flink version
  • Copies configuration files from the Flink distribution assembly descriptor (bin.xml)
  • Copies binary scripts and UDF runner scripts
  • Creates symlinks (or copies as fallback) for examples, LICENSE, and README files
  • Creates a temporary log directory

Version Management: The version is read dynamically from pyflink/version.py. For development versions (containing "dev"), the apache-flink-libraries dependency is pinned to the exact version. For release versions, it uses a range from the current version up to (but not including) the next patch version.

Package Assembly: The script assembles 27+ Python packages covering the core pyflink module, table API, datastream API, function execution, metrics, configuration, logging, examples, and testing utilities.

Dependencies: The install requirements include py4j (0.10.9.7), python-dateutil, apache-beam (2.54.0-2.61.0), cloudpickle, avro, pytz, fastavro, requests, protobuf, numpy, pandas (1.3.0-2.3), pyarrow (5.0.0-21.0.0), pemja (non-Windows only), httplib2, ruamel.yaml, and the apache-flink-libraries package.

Usage

This script is used to build and install the PyFlink package. It should be run from the flink-python directory, either within the Flink source tree (for development builds) or from an extracted source distribution (for release installations).

Code Reference

Source Location

  • Repository: Apache_Flink
  • File: flink-python/setup.py
  • Lines: 1-367

Signature

# Module-level functions
def remove_if_exists(file_path)
def copy_files(src_paths, output_directory)
def has_unsupported_tag(file_element)
def extracted_output_files(base_dir, file_path, output_directory)

# Main entry point
setup(
    name='apache-flink',
    version=VERSION,
    packages=PACKAGES,
    ...
)

Import

# This is a standalone setup script, not typically imported.
# It is invoked via:
#   pip install .
#   python setup.py install
#   python setup.py bdist_wheel

I/O Contract

Inputs

Name Type Required Description
pyflink/version.py Python file Yes Defines the __version__ variable used for package versioning
README.md Markdown file Yes Long description content for the package metadata
../pom.xml XML file No (source tree only) Parent Maven POM used to extract the Flink version
../flink-dist/src/main/assemblies/bin.xml XML file No (source tree only) Assembly descriptor defining which configuration and binary files to include
Cython .pyx files Cython source No Performance-critical extension source files for fn_execution modules

Outputs

Name Type Description
apache-flink package Python distribution The installable PyFlink package containing all Python modules, configuration, scripts, and optional Cython extensions
deps/ Temporary directory Temporary directory created during source-tree builds containing conf, log, examples, and bin files; cleaned up after setup completes

Usage Examples

Basic Usage

# Install PyFlink from the source tree
# (run from the flink-python directory)
# pip install .

# Build a wheel distribution
# python setup.py bdist_wheel

# Install in development mode
# pip install -e .

# The script auto-detects the Flink source tree and assembles
# all necessary artifacts. Key dependencies installed:
#   - py4j==0.10.9.7
#   - apache-beam>=2.54.0,<=2.61.0
#   - pandas>=1.3.0,<2.3
#   - pyarrow>=5.0.0,<21.0.0
#   - numpy>=1.22.4
#   - apache-flink-libraries (version-matched)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment