Principle:Datahub project Datahub CLI Installation For Docker
| Field | Value |
|---|---|
| Principle Name | CLI Installation For Docker |
| Namespace | Datahub_project_Datahub |
| Workflow | Docker_Quickstart_Deployment |
| Type | Principle |
| Last Updated | 2026-02-10 |
| Source Repository | datahub-project/datahub |
| Domains | Deployment, Docker, Metadata_Management |
Overview
Installing the DataHub CLI specifically for Docker-based deployment management commands. While the same package as general CLI installation, this context focuses on the docker subcommand group (quickstart, nuke, ingest-sample-data). No connector extras are needed for Docker commands.
Description
The DataHub CLI is distributed as the acryl-datahub Python package via PyPI. For Docker-based deployment management, the base installation without any extras is sufficient because the Docker commands rely only on the framework dependencies (Click, Docker SDK, PyYAML, etc.) which are included in the default installation.
The CLI follows a single entry point pattern where one executable (datahub) provides multiple subcommand groups for different operational contexts:
datahub docker quickstart-- Launch the DataHub stackdatahub docker nuke-- Destroy the DataHub stackdatahub docker ingest-sample-data-- Load demonstration datadatahub docker check-- Verify container health
The docker subcommand group is registered via the entrypoints.py module at line 362 (datahub.add_command(docker)), which imports the docker Click group from datahub.cli.docker_cli.
The package requires Python >= 3.10 and installs the following key framework dependencies relevant to Docker operations:
click-- CLI frameworkdocker-- Docker SDK for Python (container management)PyYAML-- Compose file parsingrequests/requests_file-- Downloading compose filesexpandvars-- Environment variable expansion in compose files
Usage
When setting up a local DataHub development or evaluation environment using Docker. The installation is a single pip command with no extras required for Docker-only usage.
# Install the base package (sufficient for docker commands)
pip install acryl-datahub
# Verify installation
datahub version
# Now docker commands are available
datahub docker quickstart
For ingestion from external data sources, additional extras would be needed (e.g., pip install 'acryl-datahub[mysql,snowflake]'), but these are not required for Docker deployment management.
Theoretical Basis
This principle follows the single entry point pattern -- one CLI tool provides multiple subcommand groups for different operational contexts. This reduces cognitive overhead for users by consolidating all DataHub operations under a single command namespace rather than requiring separate tools for deployment, ingestion, and administration.
The pattern also enables shared infrastructure (configuration, telemetry, logging) across all subcommands while keeping individual command groups self-contained in their dependencies.
Knowledge Sources
- DataHub GitHub Repository
- DataHub Official Documentation
- Source files:
metadata-ingestion/setup.py,metadata-ingestion/src/datahub/entrypoints.py
Related Pages
- Implemented by: Datahub_project_Datahub_Pip_Install_Datahub_Docker
Implementation:Datahub_project_Datahub_Pip_Install_Datahub_Docker