Implementation:Datahub project Datahub Pip Install Acryl Datahub
| Field | Value |
|---|---|
| Implementation Name | Pip Install Acryl Datahub |
| Overview | Concrete tool for installing the DataHub CLI and Python SDK via pip, including optional connector extras. |
| Type | External Tool Doc |
| Implements | Datahub_project_Datahub_CLI_Package_Installation |
| Status | Active |
| Domains | Data_Integration, Metadata_Management |
| Source | DataHub Repository -- metadata-ingestion/setup.py (lines 954-1108 for entry points, lines 1111-1149 for package setup)
|
| Last Updated | 2026-02-10 |
| Knowledge Sources | DataHub Repository |
Description
The acryl-datahub package is the primary distribution artifact for the DataHub CLI and Python ingestion framework. It is installed via pip and provides the datahub command-line tool along with the full SDK for programmatic metadata emission.
The package uses a modular extras system where each data source connector, sink, or feature is declared as an optional dependency group. This allows users to install only the libraries required for their specific use case.
Installation
Basic Installation
pip install acryl-datahub
Installation with a Connector Extra
# Install with Snowflake connector
pip install 'acryl-datahub[snowflake]'
# Install with BigQuery connector
pip install 'acryl-datahub[bigquery]'
# Install with multiple connectors
pip install 'acryl-datahub[snowflake,bigquery,mysql]'
Entry Point
The CLI is registered as a console script entry point in setup.py:
entry_points = {
"console_scripts": ["datahub = datahub.entrypoints:main"],
...
}
After installation, the datahub command is available on the system PATH.
Key Parameters
| Parameter | Type | Description |
|---|---|---|
| connector extra | string | Name of the optional connector to install (e.g., snowflake, bigquery, mysql, kafka, postgres, hive, looker, tableau, dbt, etc.)
|
I/O Contract
Inputs
- Python >= 3.10 environment
pippackage manager- Network access to PyPI (or a configured private index)
Outputs
datahubCLI command available on PATH- Python package
datahubimportable in Python code - All plugin entry points registered for the installed extras
Available Connector Extras (Partial List)
| Extra Name | Source Class |
|---|---|
snowflake |
datahub.ingestion.source.snowflake.snowflake_v2:SnowflakeV2Source
|
bigquery |
datahub.ingestion.source.bigquery_v2.bigquery:BigqueryV2Source
|
mysql |
datahub.ingestion.source.sql.mysql:MySQLSource
|
postgres |
datahub.ingestion.source.sql.postgres:PostgresSource
|
kafka |
datahub.ingestion.source.kafka.kafka:KafkaSource
|
hive |
datahub.ingestion.source.sql.hive.hive_source:HiveSource
|
looker |
datahub.ingestion.source.looker.looker_source:LookerDashboardSource
|
tableau |
datahub.ingestion.source.tableau.tableau:TableauSource
|
dbt |
datahub.ingestion.source.dbt.dbt_core:DBTCoreSource
|
redshift |
datahub.ingestion.source.redshift.redshift:RedshiftSource
|
Available Sink Plugins
| Sink Name | Sink Class |
|---|---|
datahub-rest |
datahub.ingestion.sink.datahub_rest:DatahubRestSink
|
datahub-kafka |
datahub.ingestion.sink.datahub_kafka:DatahubKafkaSink
|
file |
datahub.ingestion.sink.file:FileSink
|
console |
datahub.ingestion.sink.console:ConsoleSink
|
Package Metadata
From setup.py (lines 1111-1149):
setuptools.setup(
name=package_metadata["__package_name__"],
version=_version,
url="https://docs.datahub.com/",
license="Apache-2.0",
description="A CLI to work with DataHub metadata",
python_requires=">=3.10",
package_dir={"": "src"},
packages=setuptools.find_namespace_packages(where="./src"),
...
)
Usage Examples
Verify Installation
datahub version
Check Available Plugins
datahub check plugins
Related Pages
- Implements: Datahub_project_Datahub_CLI_Package_Installation
- Related: Datahub_project_Datahub_PipelineConfig
- Related: Datahub_project_Datahub_Ingest_CLI_Run
- Environment: Environment:Datahub_project_Datahub_Python_3_10_Ingestion_Environment
- Heuristic: Heuristic:Datahub_project_Datahub_Gradle_Formatting_Over_Direct_Tools