Principle:Apache Hudi Docker Prerequisites Check
| Knowledge Sources | |
|---|---|
| Domains | DevOps, Development_Environment |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Ensuring that the required container runtime and orchestration tools are installed and accessible before attempting to run the Apache Hudi Docker demo environment.
Description
The Docker Prerequisites Check principle addresses the foundational requirement that a developer's local machine must have a properly configured container runtime before any Hudi Docker demo workflows can proceed. The Hudi Docker demo relies on multiple containerized services (Hadoop, Hive, Spark, Kafka, MinIO) orchestrated together, and the absence or misconfiguration of any prerequisite tool will cause the entire demo setup to fail.
The key prerequisites fall into two categories:
Container Runtime Prerequisites:
- Docker CLI -- The core container engine must be installed and running. Docker provides process isolation via Linux namespaces and cgroups, allowing each Hudi component to run in its own isolated environment with controlled resource allocation.
- Docker Compose -- The orchestration layer that reads declarative YAML configuration files and manages multi-container applications. Hudi supports both the legacy standalone
docker-compose(v1) binary and the moderndocker compose(v2) plugin integrated into the Docker CLI.
Build Prerequisites (for local image building):
- Maven -- Required when building Docker images from source using the Maven-based build pipeline (
build_local_docker_images.sh). Maven orchestrates the multi-module build and triggers Docker image assembly via thepre-integration-testphase. - Java/JDK -- Required as a dependency of Maven and for compiling Hudi source code when building images locally.
A robust prerequisites check must handle the Docker Compose version split gracefully. Docker Compose v1 was a standalone Python binary invoked as docker-compose, while v2 is a Go-based plugin invoked as docker compose (a subcommand of Docker). The get_docker_compose_cmd() pattern used in Hudi scripts probes for both variants and selects the one available, ensuring compatibility across different Docker installations.
Usage
Apply this principle:
- Before running any Hudi Docker demo script (
setup_demo.sh,stop_demo.sh,run_spark_hudi.sh) - When setting up a new development machine or CI/CD environment for Hudi contribution
- When troubleshooting "command not found" errors during demo execution
- When migrating between Docker Compose v1 and v2
Theoretical Basis
The prerequisites check embodies the fail-fast design principle in DevOps tooling. Rather than allowing a complex multi-container startup sequence to partially execute and fail midway (leaving orphaned containers and inconsistent state), the system validates tool availability upfront.
Containerization Fundamentals:
Docker containers provide operating-system-level virtualization. Each container shares the host kernel but runs in an isolated user space with its own filesystem, network stack, and process tree. The Docker daemon manages container lifecycle, and the CLI provides the user-facing interface. For the Hudi demo, this means each service (NameNode, DataNode, HiveMetastore, SparkMaster, etc.) runs as if it were on a separate machine, yet all share the host's resources efficiently.
Compose Orchestration:
Docker Compose extends single-container management to multi-container applications. It uses a declarative YAML specification to define services, networks, volumes, and dependencies. The Hudi demo compose files define 13+ services with inter-service dependencies (e.g., DataNode depends on NameNode, HiveServer depends on HiveMetastore). Compose handles startup ordering, network creation, and volume mounting according to these declarations.
Version Detection Pattern:
The get_docker_compose_cmd() function implements a capability detection pattern common in cross-platform scripting. Rather than assuming a specific installation, it probes the environment at runtime:
get_docker_compose_cmd() {
if docker compose version &>/dev/null; then
echo "docker compose"
elif docker-compose version &>/dev/null; then
echo "docker-compose"
else
echo "ERROR: Neither 'docker compose' nor 'docker-compose' is installed." >&2
exit 1
fi
}
This pattern ensures forward compatibility as the Docker ecosystem evolves, and provides a clear error message when neither variant is available.