Environment:Datahub project Datahub Python 3 10 Ingestion Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Metadata_Ingestion, Python |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Python 3.10+ environment with pydantic v2, click CLI framework, and optional source-specific extras for running DataHub metadata ingestion and actions.
Description
This environment provides the Python runtime required by the metadata-ingestion and datahub-actions packages. It requires Python 3.10 as the minimum version. Python 3.10 and 3.11 are actively tested; Python 3.12+ triggers a runtime warning and is not officially supported. The core stack includes pydantic v2 for configuration validation, click for CLI interactions, PyYAML for recipe parsing, and aiohttp for async HTTP operations. Source-specific connectors (Snowflake, BigQuery, Kafka, etc.) are installed via pip extras.
Usage
Use this environment for any CLI Metadata Ingestion, Python SDK Metadata Emission, or Actions Framework workflow. It is the mandatory prerequisite for running the DataHub CLI (`datahub ingest`), programmatic emitters (`DataHubRestEmitter`), and the actions daemon (`datahub-actions`).
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, Windows (WSL2) | Linux recommended for production |
| Python | 3.10 or 3.11 | 3.12+ triggers warning; 3.10 minimum enforced in setup.py |
| Disk | 2GB+ | Varies with extras installed (full `[all]` can be large) |
Dependencies
System Packages
- `python3-dev` (Linux) or Xcode CLI tools (macOS)
- `python3-venv` (Linux, for virtual environment creation)
- `openldap-dev` (only if using LDAP source connector)
Python Packages (Core)
- `pydantic` >= 2.4.0, < 3.0.0
- `pydantic_core` != 2.41.3 (excludes buggy release)
- `click` >= 7.1.2, != 8.2.0, < 9.0.0
- `PyYAML` < 7.0.0
- `aiohttp` < 4
- `avro` >= 1.11.3, < 1.13
- `requests` (via aiohttp/urllib3)
- `python-dateutil` >= 2.8.0, < 3.0.0
- `setuptools` < 82.0.0
Python Packages (Key Extras)
- [kafka]: `confluent-kafka` >= 2.10.1, < 2.13.0; `fastavro` >= 1.2.0
- [snowflake]: `snowflake-connector-python` >= 3.4.0; `pandas` < 3.0.0
- [bigquery]: `google-cloud-bigquery` < 4.0.0; `google-cloud-datacatalog` >= 1.5.0
- [databricks]: `databricks-sdk` >= 0.30.0; `pyspark` ~= 3.5.6
- [s3]: `pyspark` ~= 3.5.6 (use `[s3-slim]` to avoid PySpark)
- [iceberg]: `pyiceberg` >= 0.9.0, <= 0.10.0; `pydantic` < 2.12
Credentials
The following environment variables must be set for authentication:
- `DATAHUB_GMS_URL`: Complete GMS server URL (e.g., `http://localhost:8080`)
- `DATAHUB_GMS_TOKEN`: Authentication token for GMS API access
- `DATAHUB_GMS_HOST`: GMS host (deprecated fallback for DATAHUB_GMS_URL)
- `DATAHUB_GMS_PORT`: GMS port number (deprecated fallback)
- `DATAHUB_GMS_PROTOCOL`: Protocol for GMS connection, `http` or `https` (default: `http`)
- `DATAHUB_USERNAME`: Username for generating access tokens
- `DATAHUB_PASSWORD`: Password for generating access tokens
Quick Install
# Install core CLI
pip install 'acryl-datahub'
# Install with specific source extras
pip install 'acryl-datahub[snowflake,bigquery,kafka]'
# Install actions framework
pip install 'acryl-datahub-actions'
# Install actions with Kafka source
pip install 'acryl-datahub-actions[kafka]'
Code Evidence
Python version warning from `entrypoints.py:55-60`:
if sys.version_info >= (3, 12):
click.secho(
"Python versions above 3.11 are not actively tested with yet. Please use Python 3.11 for now.",
fg="red",
err=True,
)
Python minimum version from `setup.py:1147`:
python_requires=">=3.10"
Pydantic core exclusion from `setup.py:23`:
# https://github.com/pydantic/pydantic-core/issues/1841
"pydantic_core!=2.41.3,<3.0.0",
Click version exclusion from `setup.py:39`:
"click>=7.1.2,!=8.2.0,<9.0.0",
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `Python versions above 3.11 are not actively tested` | Python 3.12+ detected at CLI startup | Downgrade to Python 3.10 or 3.11 |
| `ImportError: No module named 'datahub'` | Package not installed in active venv | Run `pip install acryl-datahub` |
| `confluent_kafka` build failure | Missing librdkafka system library | Install `librdkafka-dev` (apt) or `librdkafka` (brew) |
| `pydantic.errors.PydanticUserError` | Pydantic v1 API used with v2 | Ensure pydantic >= 2.4.0 is installed |
Compatibility Notes
- Nix/Immutable filesystems: Set `DATAHUB_VENV_USE_COPIES=true` if venv creation fails due to symlink restrictions.
- Windows: Not officially supported; use WSL2 for development.
- PyIceberg extras: Requires `pydantic < 2.12` which may conflict with other extras.
- PySpark extras: The `[s3]` extra includes PySpark 3.5.6; use `[s3-slim]` for a lightweight alternative without PySpark.
- numpy constraint: Several extras (feast, cassandra) require `numpy < 2` due to binary incompatibility.
Related Pages
- Implementation:Datahub_project_Datahub_Pip_Install_Acryl_Datahub
- Implementation:Datahub_project_Datahub_Pip_Install_Datahub_SDK
- Implementation:Datahub_project_Datahub_Pip_Install_Datahub_Actions
- Implementation:Datahub_project_Datahub_PipelineConfig
- Implementation:Datahub_project_Datahub_DatahubClientConfig
- Implementation:Datahub_project_Datahub_Ingest_CLI_Run
- Implementation:Datahub_project_Datahub_DataHubRestEmitter_Init
- Implementation:Datahub_project_Datahub_Mce_Builder_URN_Helpers
- Implementation:Datahub_project_Datahub_MetadataChangeProposalWrapper_Init
- Implementation:Datahub_project_Datahub_Emitter_Emit
- Implementation:Datahub_project_Datahub_Actions_PipelineConfig
- Implementation:Datahub_project_Datahub_FilterTransformer_Transform
- Implementation:Datahub_project_Datahub_Action_Act_Interface
- Implementation:Datahub_project_Datahub_Actions_CLI_Run