Principle:Datahub project Datahub Docker Prerequisites Validation
| Field | Value |
|---|---|
| Principle Name | Docker Prerequisites Validation |
| Namespace | Datahub_project_Datahub |
| Workflow | Docker_Quickstart_Deployment |
| Type | Principle |
| Last Updated | 2026-02-10 |
| Source Repository | datahub-project/datahub |
| Domains | Deployment, Docker, Metadata_Management |
Overview
The process of verifying that a host environment meets minimum requirements for running the DataHub Docker stack. Prerequisites validation checks Docker daemon availability, Docker Compose v2 installation, minimum memory (4.3 GB), and minimum disk space (13 GB). This prevents failed deployments due to insufficient resources.
Description
Docker Prerequisites Validation is a preflight check that runs before any container orchestration begins. The DataHub quickstart stack requires multiple services (GMS, Frontend, Kafka, Elasticsearch, MySQL, Schema Registry, Zookeeper) running simultaneously, which imposes non-trivial resource requirements on the host machine.
The validation performs four sequential checks:
- Docker Daemon Availability -- Verifies that the Docker daemon is running and reachable. The implementation attempts to connect to the Docker socket, falling back to
~/.docker/run/docker.sockfor Docker Desktop 4.13.0+ compatibility. A ping is sent to confirm communication. - Docker Compose v2 Installation -- Verifies that Docker Compose v2 (or later) is installed, either as the
docker composeplugin or as the standalonedocker-composebinary. Docker Compose v1 is explicitly rejected with an error message. - Memory Check -- Queries the Docker daemon for total configured memory and verifies it meets the minimum threshold of 4.3 GB. The threshold includes a buffer because Docker tends to under-report allocated memory.
- Disk Space Check -- Runs a lightweight Alpine container to measure available disk space within the Docker runtime environment and verifies it meets the minimum threshold of 13 GB.
If any check fails, a descriptive exception is raised that tells the user exactly what to fix (e.g., increase Docker memory allocation, install Docker Compose v2).
Usage
Prerequisites validation is used before launching the DataHub quickstart stack to ensure the environment can support all required containers. It is invoked automatically as part of the datahub docker quickstart command and does not require explicit user action.
Typical scenarios:
- First-time setup -- When a developer or evaluator runs DataHub locally for the first time, the preflight check catches misconfigured Docker Desktop settings before a lengthy download-and-start process.
- CI/CD pipelines -- Automated environments benefit from early failure with actionable messages rather than cryptic container crashes mid-startup.
- Resource-constrained environments -- Laptop environments or small VMs often have Docker configured with default (insufficient) memory allocations.
Theoretical Basis
This principle follows the preflight check pattern -- validate preconditions before expensive operations to fail fast with actionable error messages. Rather than allowing the Docker Compose orchestration to proceed and fail partway through (potentially leaving orphaned containers or partial state), the system validates all prerequisites upfront.
The pattern is analogous to aircraft preflight checklists: verify critical conditions before committing to an irreversible (or expensive-to-reverse) process. This reduces debugging time and improves the developer experience by providing clear guidance on resolution steps.
Knowledge Sources
- DataHub GitHub Repository
- DataHub Official Documentation
- Source file:
metadata-ingestion/src/datahub/cli/docker_check.py
Related Pages
- Implemented by: Datahub_project_Datahub_Run_Quickstart_Preflight_Checks
Implementation:Datahub_project_Datahub_Run_Quickstart_Preflight_Checks