Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Datahub project Datahub Docker Quickstart Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Docker, Deployment
Last Updated 2026-02-10 00:00 GMT

Overview

Docker Desktop/Engine environment with Docker Compose v2, minimum 8GB RAM, 2 CPUs, and 13GB disk space for running the full DataHub stack locally.

Description

This environment defines the hardware and software prerequisites for deploying DataHub using the Docker Quickstart method. The stack includes GMS (backend), frontend, Elasticsearch, MySQL (or PostgreSQL), Kafka (with Zookeeper and Schema Registry), and optionally Neo4j. Docker Compose v2 is strictly required; v1 is explicitly rejected with a clear error message. The CLI performs automated preflight checks for memory (4.3GB minimum reported by Docker, accounting for Docker under-reporting) and disk space (13GB minimum).

Usage

Use this environment for Docker Quickstart Deployment workflows, including `datahub docker quickstart`, `datahub docker nuke`, and local development stacks. This is the standard way to run DataHub locally for development, testing, and evaluation.

System Requirements

Category Requirement Notes
OS Linux, macOS, Windows (Docker Desktop) Any OS supporting Docker Desktop or Docker Engine
Docker Docker Compose v2 or later v1 explicitly rejected; detected via `docker compose version`
RAM 8 GB minimum (Docker Desktop setting) Preflight checks validate 4.3GB available (Docker under-reports)
CPU 2 CPUs minimum Allocated to Docker Desktop
Swap 2 GB minimum Docker Desktop swap allocation
Disk 13 GB available Checked before quickstart launch

Dependencies

System Packages

  • Docker Desktop (macOS/Windows) or Docker Engine + Docker Compose v2 (Linux)
  • `python3` >= 3.10 (for `datahub` CLI that orchestrates Docker)
  • `pip` (to install `acryl-datahub[docker]`)

Container Images

Default container image versions used by the quickstart:

Service Image Default Version
MySQL mysql 8.2
Elasticsearch elasticsearch 7.16.1
Neo4j neo4j 4.4.9-community
Kafka Broker confluentinc/cp-kafka 7.9.2
Schema Registry confluentinc/cp-schema-registry 7.9.2
Zookeeper confluentinc/cp-zookeeper 7.9.2

Credentials

The following environment variables can be set for customization:

  • `DATAHUB_VERSION`: DataHub image version tag to deploy
  • `DATAHUB_COMPOSE_PROJECT_NAME`: Docker Compose project name (default: `datahub`)
  • `METADATA_SERVICE_AUTH_ENABLED`: Enable GMS authentication (default: `false`)
  • `DATAHUB_MAPPED_MYSQL_PORT`: Override MySQL port mapping
  • `DATAHUB_MAPPED_KAFKA_BROKER_PORT`: Override Kafka broker port mapping
  • `DATAHUB_MAPPED_ELASTIC_PORT`: Override Elasticsearch port mapping

Quick Install

# Install DataHub CLI with Docker support
pip install 'acryl-datahub[docker]'

# Launch the full stack
datahub docker quickstart

# Check stack health
datahub docker check

# Tear down and remove all data
datahub docker nuke

Code Evidence

Memory and disk space constants from `docker_check.py:16-18`:

# Docker seems to under-report memory allocated, so we also need a bit of buffer to account for it.
MIN_MEMORY_NEEDED = 4.3  # GB
MIN_DISK_SPACE_NEEDED = 13  # GB

Docker Compose v2 requirement check from `docker_cli.py:220-252`:

# Attempts docker compose version --short (v2)
# Falls back to docker-compose version --short (v1)
# Raises DockerComposeVersionError if only v1 found:
# "You have docker-compose v1 ({compose_version}) installed,
#  but we require Docker Compose v2 or later."

Quickstart timeouts from `docker_cli.py:51-53`:

# Max wait time: 10 minutes
# Docker up timeout: 100 seconds per attempt
# Status check interval: 2 seconds

Common Errors

Error Message Cause Solution
`You have docker-compose v1 installed, but we require Docker Compose v2 or later` Docker Compose v1 detected Upgrade to Docker Compose v2 (included in Docker Desktop 4.x+)
`DockerLowMemoryError` Less than 4.3GB memory available to Docker Increase Docker Desktop memory allocation to 8GB+
`DockerLowDiskSpaceError` Less than 13GB free disk space Free up disk space or increase Docker disk allocation
`DockerNotRunningError` Docker daemon not running Start Docker Desktop or `systemctl start docker`
GMS health check timeout GMS takes >90s to start Ensure sufficient RAM; check for port conflicts on 8080

Compatibility Notes

  • Default ports: Frontend=9002, GMS=8080, Elasticsearch=9200, Neo4j=7474/7687, Schema Registry=8081, Kafka=9092, Zookeeper=2181.
  • Database backends: MySQL (default), PostgreSQL, MariaDB, Cassandra all supported via compose overrides.
  • ARM64/M1 Macs: Use the `.m1.yml` override files for ARM-compatible images.
  • Elasticsearch memory: Limited to 1GB within the container; Java heap configured separately.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment