Environment:Datahub project Datahub Docker Runtime
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Deployment |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Docker and Docker Compose runtime environment required for deploying DataHub locally via the quickstart workflow.
Description
This environment provides the container runtime needed to run the full DataHub stack locally. It uses Docker Compose V2 to orchestrate multiple services including MySQL/PostgreSQL, Elasticsearch/OpenSearch, Apache Kafka (via Confluent), the GMS backend, and the React frontend. The CLI command datahub docker quickstart performs automated preflight checks for memory and disk space before launching.
Usage
Use this environment for any local deployment of DataHub via Docker Compose, including quickstart, development, and smoke testing. It is the mandatory prerequisite for running the Docker_CLI_Quickstart and Docker_Health_Check_Pattern implementations.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, or Windows (via Docker Desktop) | WSL2 recommended on Windows |
| CPU | Minimum 2 cores | Tested & confirmed minimum |
| RAM | Minimum 8 GB system RAM | Docker engine must be allocated at least 4.3 GB |
| Disk | Minimum 13 GB free space | SSD recommended for Elasticsearch performance |
| Docker Engine | Docker Desktop or Docker for Linux | Docker Desktop 4.13.0+ has a known socket symlink issue (see Common Errors) |
| Docker Compose | Version 2.0+ (2.20+ recommended) | V1 is deprecated and not supported |
Dependencies
System Packages
docker(Docker Engine)docker-composeV2 plugin (bundled with Docker Desktop)
Container Images (pulled automatically)
mysql:8orpostgres:16elasticsearch:7.10oropensearchproject/opensearch:2.xconfluentinc/cp-kafka,confluentinc/cp-schema-registry,confluentinc/cp-zookeeperlinkedin/datahub-gmslinkedin/datahub-frontend-reactlinkedin/datahub-actions
Credentials
No credentials are required for the default quickstart deployment. For production or cloud deployments:
DATAHUB_GMS_TOKEN: Authentication token for GMS API access (if auth is enabled)
Quick Install
# Install Docker Desktop (macOS/Windows) or Docker Engine (Linux)
# Then verify:
docker --version
docker compose version
# Ensure Docker has at least 4.3 GB memory and 13 GB disk space allocated
# Launch DataHub:
pip install acryl-datahub
datahub docker quickstart
Code Evidence
Memory preflight check from metadata-ingestion/src/datahub/cli/docker_check.py:17-18:
# Docker seems to under-report memory allocated, so we also need a bit of buffer to account for it.
MIN_MEMORY_NEEDED = 4.3 # GB
MIN_DISK_SPACE_NEEDED = 13 # GB
Memory validation from metadata-ingestion/src/datahub/cli/docker_check.py:102-110:
def run_quickstart_preflight_checks(client: docker.DockerClient) -> None:
total_mem_configured = int(client.info()["MemTotal"])
if memory_in_gb(total_mem_configured) < MIN_MEMORY_NEEDED:
raise DockerLowMemoryError(
f"Total Docker memory configured {memory_in_gb(total_mem_configured):.2f}GB "
f"is below the minimum threshold {MIN_MEMORY_NEEDED}GB. "
"You can increase the memory allocated to Docker in the Docker settings."
)
Docker Desktop 4.13.0 socket workaround from metadata-ingestion/src/datahub/cli/docker_check.py:70-76:
# Docker Desktop 4.13.0 broke the docker.sock symlink.
# See https://github.com/docker/docker-py/issues/3059.
maybe_sock_path = os.path.expanduser("~/.docker/run/docker.sock")
if os.path.exists(maybe_sock_path):
client = docker.DockerClient(base_url=f"unix://{maybe_sock_path}")
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
Docker doesn't seem to be running. Did you start it? |
Docker daemon not started | Start Docker Desktop or sudo systemctl start docker
|
Total Docker memory configured X.XXGB is below the minimum threshold 4.3GB |
Insufficient Docker memory allocation | Increase Docker memory to at least 8 GB in Docker Desktop settings |
Total Docker disk space available X.XXGB is below the minimum threshold 13GB |
Insufficient Docker disk space | Free disk space or increase Docker disk allocation |
| Docker socket connection failure on macOS | Docker Desktop 4.13.0+ broke /var/run/docker.sock symlink |
CLI auto-falls back to ~/.docker/run/docker.sock; update Docker Desktop
|
Compatibility Notes
- macOS (Apple Silicon): Supported via Docker Desktop with Rosetta emulation. Use
--arch arm64flag if images support it natively. - Windows: Requires Docker Desktop with WSL2 backend. Native Windows containers are not supported.
- Linux: Docker Engine + Docker Compose V2 plugin. No Docker Desktop required.
- Elasticsearch memory: Container is limited to 1 GB RAM via Docker Compose
mem_limit. May need increase for large deployments.
Related Pages
- Implementation:Datahub_project_Datahub_Docker_CLI_Quickstart
- Implementation:Datahub_project_Datahub_Docker_CLI_Check
- Implementation:Datahub_project_Datahub_Docker_Health_Check_Pattern
- Implementation:Datahub_project_Datahub_Docker_CLI_Ingest_Sample_Data
- Implementation:Datahub_project_Datahub_DataHub_Frontend_Access
- Implementation:Datahub_project_Datahub_DataHub_UI_Lineage_Verification