Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Datahub project Datahub Pip Install Acryl Datahub

From Leeroopedia


Field Value
Implementation Name Pip Install Acryl Datahub
Overview Concrete tool for installing the DataHub CLI and Python SDK via pip, including optional connector extras.
Type External Tool Doc
Implements Datahub_project_Datahub_CLI_Package_Installation
Status Active
Domains Data_Integration, Metadata_Management
Source DataHub Repository -- metadata-ingestion/setup.py (lines 954-1108 for entry points, lines 1111-1149 for package setup)
Last Updated 2026-02-10
Knowledge Sources DataHub Repository

Description

The acryl-datahub package is the primary distribution artifact for the DataHub CLI and Python ingestion framework. It is installed via pip and provides the datahub command-line tool along with the full SDK for programmatic metadata emission.

The package uses a modular extras system where each data source connector, sink, or feature is declared as an optional dependency group. This allows users to install only the libraries required for their specific use case.

Installation

Basic Installation

pip install acryl-datahub

Installation with a Connector Extra

# Install with Snowflake connector
pip install 'acryl-datahub[snowflake]'

# Install with BigQuery connector
pip install 'acryl-datahub[bigquery]'

# Install with multiple connectors
pip install 'acryl-datahub[snowflake,bigquery,mysql]'

Entry Point

The CLI is registered as a console script entry point in setup.py:

entry_points = {
    "console_scripts": ["datahub = datahub.entrypoints:main"],
    ...
}

After installation, the datahub command is available on the system PATH.

Key Parameters

Parameter Type Description
connector extra string Name of the optional connector to install (e.g., snowflake, bigquery, mysql, kafka, postgres, hive, looker, tableau, dbt, etc.)

I/O Contract

Inputs

  • Python >= 3.10 environment
  • pip package manager
  • Network access to PyPI (or a configured private index)

Outputs

  • datahub CLI command available on PATH
  • Python package datahub importable in Python code
  • All plugin entry points registered for the installed extras

Available Connector Extras (Partial List)

Extra Name Source Class
snowflake datahub.ingestion.source.snowflake.snowflake_v2:SnowflakeV2Source
bigquery datahub.ingestion.source.bigquery_v2.bigquery:BigqueryV2Source
mysql datahub.ingestion.source.sql.mysql:MySQLSource
postgres datahub.ingestion.source.sql.postgres:PostgresSource
kafka datahub.ingestion.source.kafka.kafka:KafkaSource
hive datahub.ingestion.source.sql.hive.hive_source:HiveSource
looker datahub.ingestion.source.looker.looker_source:LookerDashboardSource
tableau datahub.ingestion.source.tableau.tableau:TableauSource
dbt datahub.ingestion.source.dbt.dbt_core:DBTCoreSource
redshift datahub.ingestion.source.redshift.redshift:RedshiftSource

Available Sink Plugins

Sink Name Sink Class
datahub-rest datahub.ingestion.sink.datahub_rest:DatahubRestSink
datahub-kafka datahub.ingestion.sink.datahub_kafka:DatahubKafkaSink
file datahub.ingestion.sink.file:FileSink
console datahub.ingestion.sink.console:ConsoleSink

Package Metadata

From setup.py (lines 1111-1149):

setuptools.setup(
    name=package_metadata["__package_name__"],
    version=_version,
    url="https://docs.datahub.com/",
    license="Apache-2.0",
    description="A CLI to work with DataHub metadata",
    python_requires=">=3.10",
    package_dir={"": "src"},
    packages=setuptools.find_namespace_packages(where="./src"),
    ...
)

Usage Examples

Verify Installation

datahub version

Check Available Plugins

datahub check plugins

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment