Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Datahub project Datahub Docker Quickstart Deployment

From Leeroopedia


Knowledge Sources
Domains DevOps, Deployment, Docker
Last Updated 2026-02-09 12:00 GMT

Overview

End-to-end process for deploying a local DataHub instance using Docker Compose for development, testing, and evaluation.

Description

This workflow covers the quickstart deployment of DataHub, which launches the full platform stack (14 containers) using Docker Compose. The stack includes the frontend UI, metadata service (GMS), Kafka, MySQL, Elasticsearch, and supporting services. This is the fastest way to get a running DataHub instance for local development, experimentation, or demos. The deployment supports data backup/restore, version pinning, and sample data ingestion.

Usage

Execute this workflow when you need a local DataHub instance for development, testing, demos, or evaluation. This is not recommended for production deployments; use Kubernetes with Helm charts for production environments.

Execution Steps

Step 1: Install Prerequisites

Ensure Docker, Docker Compose v2, and Python 3.10+ are installed on the host machine. Docker must have sufficient resources allocated (minimum 2 CPUs and 8GB RAM recommended).

Key considerations:

  • Docker Compose v2 is required (v1 is deprecated)
  • Allocate sufficient memory in Docker Desktop settings
  • Port 9002 (frontend), 8080 (GMS), and others must be available

Step 2: Install DataHub CLI

Install the DataHub CLI tool via pip. The CLI provides the docker quickstart commands that orchestrate the deployment.

Key considerations:

  • Install via pip install acryl-datahub
  • Verify with datahub version
  • The CLI manages Docker Compose orchestration

Step 3: Launch DataHub Stack

Execute the quickstart command to pull Docker images and start all services. The CLI downloads pre-built images from Docker Hub and launches the complete stack.

Key considerations:

  • First launch downloads several GB of Docker images
  • All 14 containers must reach healthy status
  • The process may take several minutes on first run
  • Specific versions can be pinned with the version flag

Step 4: Access and Verify

Open the DataHub UI in a browser and log in with default credentials. Verify that the platform is operational by browsing the catalog.

Key considerations:

Step 5: Load Sample Data

Optionally ingest sample metadata to explore DataHub features. The sample data includes example datasets, dashboards, charts, and lineage relationships.

Key considerations:

  • Sample data demonstrates key DataHub features
  • Useful for demos and getting familiar with the UI
  • Can be skipped if ingesting real metadata

Step 6: Manage Deployment Lifecycle

Use CLI commands to stop, restart, backup data, restore from backup, or completely reset the DataHub instance.

Key considerations:

  • Stop preserves data in Docker volumes
  • Nuke removes all data and volumes for a clean restart
  • Backup creates a portable archive of all stored metadata
  • Restore loads previously backed-up data

Execution Diagram

GitHub URL

Workflow Repository