Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Datahub project Datahub CLI Package Installation

From Leeroopedia


Field Value
Principle Name CLI Package Installation
Overview The practice of installing DataHub's command-line interface tooling to gain access to metadata ingestion, Docker management, and administrative commands.
Status Active
Domains Data_Integration, Metadata_Management
Related Implementations Datahub_project_Datahub_Pip_Install_Acryl_Datahub
Last Updated 2026-02-10
Knowledge Sources DataHub Repository

Description

CLI package installation provisions the datahub console script and all associated plugin entry points (sources, sinks, transformers, reporters). The modular extras system allows selecting only the connectors needed, minimizing dependency footprint. The package is published as acryl-datahub on PyPI and exposes a single top-level console script entry point: datahub = datahub.entrypoints:main.

The package uses Python's setuptools with a plugin registry architecture. Entry points are declared for several plugin categories:

  • datahub.ingestion.source.plugins -- Source connectors for extracting metadata (over 60 connectors including Snowflake, BigQuery, MySQL, Kafka, and more)
  • datahub.ingestion.transformer.plugins -- Transformers for modifying metadata in-flight (ownership, tags, domains, terms, etc.)
  • datahub.ingestion.sink.plugins -- Sinks for writing metadata (datahub-rest, datahub-kafka, file, console)
  • datahub.ingestion.reporting_provider.plugins -- Reporting providers for ingestion run summaries
  • datahub.ingestion.checkpointing_provider.plugins -- State checkpointing for stateful ingestion

Usage

Install the DataHub CLI when:

  • Setting up a new environment for metadata ingestion
  • Deploying DataHub locally for development or testing
  • Integrating DataHub CLI into CI/CD pipelines for automated metadata synchronization
  • Building custom ingestion scripts that leverage the DataHub Python SDK

Theoretical Basis

Package management with extras/optional dependencies follows the Python packaging convention of declaring optional feature groups (PEP 508). This allows a single package to serve multiple use cases without forcing all dependencies on every user. Each connector extra (e.g., snowflake, bigquery, mysql) declares its own dependency set, so users install only what they need. This avoids dependency conflicts and keeps the base installation lightweight.

The plugin registry pattern uses Python's entry_points mechanism, which allows third-party packages to register additional sources, sinks, and transformers without modifying the core package.

Constraints

  • Requires Python >= 3.10
  • Some connector extras have native library dependencies (e.g., confluent-kafka requires librdkafka)
  • The base installation includes framework dependencies (click, PyYAML, pydantic, avro, etc.) but no connector-specific libraries

Related Pages

Implementation:Datahub_project_Datahub_Pip_Install_Acryl_Datahub

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment