Workflow:Datahub project Datahub Docker Quickstart Deployment
| Knowledge Sources | |
|---|---|
| Domains | DevOps, Deployment, Docker |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
End-to-end process for deploying a local DataHub instance using Docker Compose for development, testing, and evaluation.
Description
This workflow covers the quickstart deployment of DataHub, which launches the full platform stack (14 containers) using Docker Compose. The stack includes the frontend UI, metadata service (GMS), Kafka, MySQL, Elasticsearch, and supporting services. This is the fastest way to get a running DataHub instance for local development, experimentation, or demos. The deployment supports data backup/restore, version pinning, and sample data ingestion.
Usage
Execute this workflow when you need a local DataHub instance for development, testing, demos, or evaluation. This is not recommended for production deployments; use Kubernetes with Helm charts for production environments.
Execution Steps
Step 1: Install Prerequisites
Ensure Docker, Docker Compose v2, and Python 3.10+ are installed on the host machine. Docker must have sufficient resources allocated (minimum 2 CPUs and 8GB RAM recommended).
Key considerations:
- Docker Compose v2 is required (v1 is deprecated)
- Allocate sufficient memory in Docker Desktop settings
- Port 9002 (frontend), 8080 (GMS), and others must be available
Step 2: Install DataHub CLI
Install the DataHub CLI tool via pip. The CLI provides the docker quickstart commands that orchestrate the deployment.
Key considerations:
- Install via pip install acryl-datahub
- Verify with datahub version
- The CLI manages Docker Compose orchestration
Step 3: Launch DataHub Stack
Execute the quickstart command to pull Docker images and start all services. The CLI downloads pre-built images from Docker Hub and launches the complete stack.
Key considerations:
- First launch downloads several GB of Docker images
- All 14 containers must reach healthy status
- The process may take several minutes on first run
- Specific versions can be pinned with the version flag
Step 4: Access and Verify
Open the DataHub UI in a browser and log in with default credentials. Verify that the platform is operational by browsing the catalog.
Key considerations:
- Frontend is available at http://localhost:9002
- Default credentials are datahub/datahub
- GMS REST API is available at http://localhost:8080
Step 5: Load Sample Data
Optionally ingest sample metadata to explore DataHub features. The sample data includes example datasets, dashboards, charts, and lineage relationships.
Key considerations:
- Sample data demonstrates key DataHub features
- Useful for demos and getting familiar with the UI
- Can be skipped if ingesting real metadata
Step 6: Manage Deployment Lifecycle
Use CLI commands to stop, restart, backup data, restore from backup, or completely reset the DataHub instance.
Key considerations:
- Stop preserves data in Docker volumes
- Nuke removes all data and volumes for a clean restart
- Backup creates a portable archive of all stored metadata
- Restore loads previously backed-up data