Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Datahub project Datahub Docker Quickstart Launch

From Leeroopedia


Field Value
Principle Name Docker Quickstart Launch
Namespace Datahub_project_Datahub
Workflow Docker_Quickstart_Deployment
Type Principle
Last Updated 2026-02-10
Source Repository datahub-project/datahub
Domains Deployment, Docker, Metadata_Management

Overview

The orchestrated process of downloading, configuring, and launching all DataHub service containers via Docker Compose. Docker quickstart launch pulls compose files from GitHub releases, resolves version mappings, downloads container images, and starts the full DataHub stack. Health polling ensures all services are ready before reporting success.

Description

Docker Quickstart Launch is the primary mechanism for deploying a complete DataHub instance locally. The process follows a multi-phase orchestration:

Phase 1: Version Resolution

The quickstart system uses a version mapping configuration (QuickstartVersionMappingConfig) that maps CLI version identifiers (like "default" or "stable") to specific Docker image tags and Git references for compose files. This decouples the CLI version from the deployed stack version.

Phase 2: Preflight Checks

Before any containers are started, the system validates Docker daemon availability, Docker Compose v2 installation, minimum memory (4.3 GB), and minimum disk space (13 GB). See Datahub_project_Datahub_Docker_Prerequisites_Validation.

Phase 3: Compose File Acquisition

The Docker Compose file is downloaded from the DataHub GitHub repository based on the resolved version tag. The file is stored locally at ~/.datahub/quickstart/docker-compose.yml. Users can also provide custom compose files via the -f flag.

Phase 4: Upgrade Compatibility Check

If an existing DataHub deployment is detected, the system checks whether it can be upgraded in place. Legacy quickstart installations (those using Zookeeper, indicating pre-profile compose format) require a manual migration via datahub docker nuke.

Phase 5: Image Pull and Container Launch

Docker images are pulled from Docker Hub (unless --no-pull-images is specified), followed by docker compose up -d --remove-orphans. The compose project name defaults to "datahub" (configurable via DATAHUB_COMPOSE_PROJECT_NAME environment variable).

Phase 6: Health Polling

After launching, the system polls container health every 2 seconds for up to 10 minutes. If containers exit or fail health checks, docker compose up is retried. On timeout, logs are dumped to a temporary file for debugging.

Services in the Stack

The DataHub quickstart stack includes:

  • datahub-gms -- Generalized Metadata Service (backend API)
  • datahub-frontend-react -- Web UI (accessible at http://localhost:9002)
  • mysql -- Primary metadata store
  • search (Elasticsearch) -- Search and discovery index
  • broker (Kafka) -- Event streaming
  • schema-registry -- Kafka schema management

Usage

When deploying DataHub locally for development, evaluation, or testing.

# Default launch (latest version)
datahub docker quickstart

# Launch a specific version
datahub docker quickstart --version v0.14.0

# Launch the latest stable version
datahub docker quickstart --version stable

# Launch without pulling images (use locally cached)
datahub docker quickstart --no-pull-images

# Launch with custom port mappings
datahub docker quickstart --mysql-port 3307 --kafka-broker-port 9093

# Launch with a custom compose file
datahub docker quickstart -f /path/to/custom-compose.yml

Theoretical Basis

This principle follows the container orchestration pattern -- Docker Compose defines a multi-service topology declaratively in a YAML file. The quickstart command wraps this with several additional concerns:

  • Version resolution -- Mapping user-friendly version identifiers to specific container tags
  • Health polling -- Active monitoring of container readiness beyond simple process liveness
  • Retry logic -- Automatic re-invocation of docker compose up when containers need restarting
  • Error reporting -- Structured diagnostics with log capture for debugging failures

This approach allows a complex multi-service deployment to be initiated with a single command while providing appropriate feedback and error handling throughout the process.

Knowledge Sources

Related Pages

Implementation:Datahub_project_Datahub_Docker_CLI_Quickstart

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment