Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Apache Hudi Docker Prerequisites Check

From Leeroopedia


Knowledge Sources
Domains DevOps, Development_Environment
Last Updated 2026-02-08 00:00 GMT

Overview

Ensuring that the required container runtime and orchestration tools are installed and accessible before attempting to run the Apache Hudi Docker demo environment.

Description

The Docker Prerequisites Check principle addresses the foundational requirement that a developer's local machine must have a properly configured container runtime before any Hudi Docker demo workflows can proceed. The Hudi Docker demo relies on multiple containerized services (Hadoop, Hive, Spark, Kafka, MinIO) orchestrated together, and the absence or misconfiguration of any prerequisite tool will cause the entire demo setup to fail.

The key prerequisites fall into two categories:

Container Runtime Prerequisites:

  • Docker CLI -- The core container engine must be installed and running. Docker provides process isolation via Linux namespaces and cgroups, allowing each Hudi component to run in its own isolated environment with controlled resource allocation.
  • Docker Compose -- The orchestration layer that reads declarative YAML configuration files and manages multi-container applications. Hudi supports both the legacy standalone docker-compose (v1) binary and the modern docker compose (v2) plugin integrated into the Docker CLI.

Build Prerequisites (for local image building):

  • Maven -- Required when building Docker images from source using the Maven-based build pipeline (build_local_docker_images.sh). Maven orchestrates the multi-module build and triggers Docker image assembly via the pre-integration-test phase.
  • Java/JDK -- Required as a dependency of Maven and for compiling Hudi source code when building images locally.

A robust prerequisites check must handle the Docker Compose version split gracefully. Docker Compose v1 was a standalone Python binary invoked as docker-compose, while v2 is a Go-based plugin invoked as docker compose (a subcommand of Docker). The get_docker_compose_cmd() pattern used in Hudi scripts probes for both variants and selects the one available, ensuring compatibility across different Docker installations.

Usage

Apply this principle:

  • Before running any Hudi Docker demo script (setup_demo.sh, stop_demo.sh, run_spark_hudi.sh)
  • When setting up a new development machine or CI/CD environment for Hudi contribution
  • When troubleshooting "command not found" errors during demo execution
  • When migrating between Docker Compose v1 and v2

Theoretical Basis

The prerequisites check embodies the fail-fast design principle in DevOps tooling. Rather than allowing a complex multi-container startup sequence to partially execute and fail midway (leaving orphaned containers and inconsistent state), the system validates tool availability upfront.

Containerization Fundamentals:

Docker containers provide operating-system-level virtualization. Each container shares the host kernel but runs in an isolated user space with its own filesystem, network stack, and process tree. The Docker daemon manages container lifecycle, and the CLI provides the user-facing interface. For the Hudi demo, this means each service (NameNode, DataNode, HiveMetastore, SparkMaster, etc.) runs as if it were on a separate machine, yet all share the host's resources efficiently.

Compose Orchestration:

Docker Compose extends single-container management to multi-container applications. It uses a declarative YAML specification to define services, networks, volumes, and dependencies. The Hudi demo compose files define 13+ services with inter-service dependencies (e.g., DataNode depends on NameNode, HiveServer depends on HiveMetastore). Compose handles startup ordering, network creation, and volume mounting according to these declarations.

Version Detection Pattern:

The get_docker_compose_cmd() function implements a capability detection pattern common in cross-platform scripting. Rather than assuming a specific installation, it probes the environment at runtime:

get_docker_compose_cmd() {
    if docker compose version &>/dev/null; then
        echo "docker compose"
    elif docker-compose version &>/dev/null; then
        echo "docker-compose"
    else
        echo "ERROR: Neither 'docker compose' nor 'docker-compose' is installed." >&2
        exit 1
    fi
}

This pattern ensures forward compatibility as the Docker ecosystem evolves, and provides a clear error message when neither variant is available.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment