Principle:Datahub project Datahub Docker Quickstart Launch
| Field | Value |
|---|---|
| Principle Name | Docker Quickstart Launch |
| Namespace | Datahub_project_Datahub |
| Workflow | Docker_Quickstart_Deployment |
| Type | Principle |
| Last Updated | 2026-02-10 |
| Source Repository | datahub-project/datahub |
| Domains | Deployment, Docker, Metadata_Management |
Overview
The orchestrated process of downloading, configuring, and launching all DataHub service containers via Docker Compose. Docker quickstart launch pulls compose files from GitHub releases, resolves version mappings, downloads container images, and starts the full DataHub stack. Health polling ensures all services are ready before reporting success.
Description
Docker Quickstart Launch is the primary mechanism for deploying a complete DataHub instance locally. The process follows a multi-phase orchestration:
Phase 1: Version Resolution
The quickstart system uses a version mapping configuration (QuickstartVersionMappingConfig) that maps CLI version identifiers (like "default" or "stable") to specific Docker image tags and Git references for compose files. This decouples the CLI version from the deployed stack version.
Phase 2: Preflight Checks
Before any containers are started, the system validates Docker daemon availability, Docker Compose v2 installation, minimum memory (4.3 GB), and minimum disk space (13 GB). See Datahub_project_Datahub_Docker_Prerequisites_Validation.
Phase 3: Compose File Acquisition
The Docker Compose file is downloaded from the DataHub GitHub repository based on the resolved version tag. The file is stored locally at ~/.datahub/quickstart/docker-compose.yml. Users can also provide custom compose files via the -f flag.
Phase 4: Upgrade Compatibility Check
If an existing DataHub deployment is detected, the system checks whether it can be upgraded in place. Legacy quickstart installations (those using Zookeeper, indicating pre-profile compose format) require a manual migration via datahub docker nuke.
Phase 5: Image Pull and Container Launch
Docker images are pulled from Docker Hub (unless --no-pull-images is specified), followed by docker compose up -d --remove-orphans. The compose project name defaults to "datahub" (configurable via DATAHUB_COMPOSE_PROJECT_NAME environment variable).
Phase 6: Health Polling
After launching, the system polls container health every 2 seconds for up to 10 minutes. If containers exit or fail health checks, docker compose up is retried. On timeout, logs are dumped to a temporary file for debugging.
Services in the Stack
The DataHub quickstart stack includes:
- datahub-gms -- Generalized Metadata Service (backend API)
- datahub-frontend-react -- Web UI (accessible at http://localhost:9002)
- mysql -- Primary metadata store
- search (Elasticsearch) -- Search and discovery index
- broker (Kafka) -- Event streaming
- schema-registry -- Kafka schema management
Usage
When deploying DataHub locally for development, evaluation, or testing.
# Default launch (latest version)
datahub docker quickstart
# Launch a specific version
datahub docker quickstart --version v0.14.0
# Launch the latest stable version
datahub docker quickstart --version stable
# Launch without pulling images (use locally cached)
datahub docker quickstart --no-pull-images
# Launch with custom port mappings
datahub docker quickstart --mysql-port 3307 --kafka-broker-port 9093
# Launch with a custom compose file
datahub docker quickstart -f /path/to/custom-compose.yml
Theoretical Basis
This principle follows the container orchestration pattern -- Docker Compose defines a multi-service topology declaratively in a YAML file. The quickstart command wraps this with several additional concerns:
- Version resolution -- Mapping user-friendly version identifiers to specific container tags
- Health polling -- Active monitoring of container readiness beyond simple process liveness
- Retry logic -- Automatic re-invocation of
docker compose upwhen containers need restarting - Error reporting -- Structured diagnostics with log capture for debugging failures
This approach allows a complex multi-service deployment to be initiated with a single command while providing appropriate feedback and error handling throughout the process.
Knowledge Sources
- DataHub GitHub Repository
- DataHub Official Documentation
- Source file:
metadata-ingestion/src/datahub/cli/docker_cli.py
Related Pages
- Implemented by: Datahub_project_Datahub_Docker_CLI_Quickstart
Implementation:Datahub_project_Datahub_Docker_CLI_Quickstart