Principle:Datahub project Datahub CLI Package Installation
| Field | Value |
|---|---|
| Principle Name | CLI Package Installation |
| Overview | The practice of installing DataHub's command-line interface tooling to gain access to metadata ingestion, Docker management, and administrative commands. |
| Status | Active |
| Domains | Data_Integration, Metadata_Management |
| Related Implementations | Datahub_project_Datahub_Pip_Install_Acryl_Datahub |
| Last Updated | 2026-02-10 |
| Knowledge Sources | DataHub Repository |
Description
CLI package installation provisions the datahub console script and all associated plugin entry points (sources, sinks, transformers, reporters). The modular extras system allows selecting only the connectors needed, minimizing dependency footprint. The package is published as acryl-datahub on PyPI and exposes a single top-level console script entry point: datahub = datahub.entrypoints:main.
The package uses Python's setuptools with a plugin registry architecture. Entry points are declared for several plugin categories:
- datahub.ingestion.source.plugins -- Source connectors for extracting metadata (over 60 connectors including Snowflake, BigQuery, MySQL, Kafka, and more)
- datahub.ingestion.transformer.plugins -- Transformers for modifying metadata in-flight (ownership, tags, domains, terms, etc.)
- datahub.ingestion.sink.plugins -- Sinks for writing metadata (datahub-rest, datahub-kafka, file, console)
- datahub.ingestion.reporting_provider.plugins -- Reporting providers for ingestion run summaries
- datahub.ingestion.checkpointing_provider.plugins -- State checkpointing for stateful ingestion
Usage
Install the DataHub CLI when:
- Setting up a new environment for metadata ingestion
- Deploying DataHub locally for development or testing
- Integrating DataHub CLI into CI/CD pipelines for automated metadata synchronization
- Building custom ingestion scripts that leverage the DataHub Python SDK
Theoretical Basis
Package management with extras/optional dependencies follows the Python packaging convention of declaring optional feature groups (PEP 508). This allows a single package to serve multiple use cases without forcing all dependencies on every user. Each connector extra (e.g., snowflake, bigquery, mysql) declares its own dependency set, so users install only what they need. This avoids dependency conflicts and keeps the base installation lightweight.
The plugin registry pattern uses Python's entry_points mechanism, which allows third-party packages to register additional sources, sinks, and transformers without modifying the core package.
Constraints
- Requires Python >= 3.10
- Some connector extras have native library dependencies (e.g., confluent-kafka requires librdkafka)
- The base installation includes framework dependencies (click, PyYAML, pydantic, avro, etc.) but no connector-specific libraries
Related Pages
- Implemented by: Datahub_project_Datahub_Pip_Install_Acryl_Datahub
Implementation:Datahub_project_Datahub_Pip_Install_Acryl_Datahub