Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Hudi Build Docker Images Script

From Leeroopedia


Knowledge Sources
Domains DevOps, Development_Environment
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for building the nine Docker images that compose the Apache Hudi demo cluster, provided by Apache Hudi Docker demo.

Description

The Hudi project provides two shell scripts for building Docker images:

build_docker_images.sh (direct Docker build) -- This script uses the Docker CLI directly to build all nine images from Dockerfiles in the docker/hoodie/hadoop/ directory tree. It auto-detects the host CPU architecture, configures the Docker build platform, and iterates through a predefined list of image configurations. Each image is tagged with both :latest and :1.1.0. The script exits immediately on any build failure.

build_local_docker_images.sh (Maven-based build) -- This script uses Maven's pre-integration-test lifecycle phase to build images through the project's build system. It first prompts the user to confirm they want to build from scratch (since pre-built images are available on Docker Hub), then invokes Maven from the repository root. This approach compiles Hudi modules first and copies the resulting JARs into the Docker build contexts before image assembly.

Usage

Use these scripts when:

  • You need to build Docker images containing locally modified Hudi code
  • Pre-built images on Docker Hub are outdated or incompatible with your branch
  • You are developing on an architecture not covered by published images (e.g., arm64)
  • You want to create a fully self-contained local development environment

Code Reference

Source Location

  • Repository: Apache Hudi
  • File: docker/build_docker_images.sh
  • Lines: 21-73
  • Additional File: docker/build_local_docker_images.sh
  • Lines: 19-33

Script

build_docker_images.sh (direct Docker build):

#!/bin/bash
# Detect system architecture and set Docker platform accordingly
ARCHITECTURE=$(uname -m)
case "$ARCHITECTURE" in
  x86_64|amd64)
    DOCKER_PLATFORM='linux/amd64'
    ;;
  aarch64|arm64)
    DOCKER_PLATFORM='linux/arm64'
    ;;
  *)
    echo "Unsupported architecture: $ARCHITECTURE"
    exit 1
    ;;
esac
export DOCKER_DEFAULT_PLATFORM="$DOCKER_PLATFORM"
export BUILDX_EXPERIMENTAL=1

SCRIPT_DIR=$(cd $(dirname "$0") && pwd)

# Versions for Hadoop, Spark, and Hive
HADOOP_VERSION="3.3.4"
SPARK_VERSION="3.5.3"
HIVE_VERSION="3.1.3"

# Docker image tags
VERSION_TAG="1.1.0"
LATEST_TAG="latest"
DOCKER_CONTEXT_DIR="hoodie/hadoop"

# List of images to build: "subdir|image_base_name"
DOCKER_IMAGES=(
  "base|apachehudi/hudi-hadoop_${HADOOP_VERSION}-base"
  "datanode|apachehudi/hudi-hadoop_${HADOOP_VERSION}-datanode"
  "historyserver|apachehudi/hudi-hadoop_${HADOOP_VERSION}-history"
  "hive_base|apachehudi/hudi-hadoop_${HADOOP_VERSION}-hive_${HIVE_VERSION}"
  "namenode|apachehudi/hudi-hadoop_${HADOOP_VERSION}-namenode"
  "spark_base|apachehudi/hudi-hadoop_${HADOOP_VERSION}-hive_${HIVE_VERSION}-sparkbase_${SPARK_VERSION}"
  "sparkadhoc|apachehudi/hudi-hadoop_${HADOOP_VERSION}-hive_${HIVE_VERSION}-sparkadhoc_${SPARK_VERSION}"
  "sparkmaster|apachehudi/hudi-hadoop_${HADOOP_VERSION}-hive_${HIVE_VERSION}-sparkmaster_${SPARK_VERSION}"
  "sparkworker|apachehudi/hudi-hadoop_${HADOOP_VERSION}-hive_${HIVE_VERSION}-sparkworker_${SPARK_VERSION}"
)

# Build each Docker image in the list
for IMAGE_CONFIG in "${DOCKER_IMAGES[@]}"; do
  IFS='|' read -r SUBDIR IMAGE_BASE <<< "$IMAGE_CONFIG"
  IMAGE_CONTEXT="$DOCKER_CONTEXT_DIR/$SUBDIR"
  TAG_LATEST="$IMAGE_BASE:$LATEST_TAG"
  TAG_VERSIONED="$IMAGE_BASE:$VERSION_TAG"
  echo "Building $IMAGE_CONTEXT as $TAG_LATEST and $TAG_VERSIONED"
  if ! docker build "$IMAGE_CONTEXT" -t "$TAG_LATEST" -t "$TAG_VERSIONED"; then
    echo "Error: Failed to build docker image for $IMAGE_CONTEXT"
    exit 1
  fi
done

build_local_docker_images.sh (Maven-based build):

#!/bin/bash
SCRIPT_PATH=$(cd `dirname $0`; pwd)
WS_ROOT=`dirname $SCRIPT_PATH`

while true; do
    read -p "Docker images can be downloaded from docker hub and seamlessly mounted \
with latest HUDI jars. Do you still want to build docker images from scratch?" yn
    case $yn in
        [Yy]* ) make install; break;;
        [Nn]* ) exit;;
        * ) echo "Please answer yes or no.";;
    esac
done
pushd ${WS_ROOT}
mvn clean pre-integration-test -DskipTests -Ddocker.compose.skip=true -Ddocker.build.skip=false
popd

I/O Contract

Inputs

Name Type Required Description
Dockerfiles Files Yes Dockerfile definitions in docker/hoodie/hadoop/ subdirectories: base/, datanode/, historyserver/, hive_base/, namenode/, spark_base/, sparkadhoc/, sparkmaster/, sparkworker/
HADOOP_VERSION Hardcoded constant Yes Set to 3.3.4 in the script. Determines the Hadoop version embedded in image names.
SPARK_VERSION Hardcoded constant Yes Set to 3.5.3 in the script. Determines the Spark version embedded in image names.
HIVE_VERSION Hardcoded constant Yes Set to 3.1.3 in the script. Determines the Hive version embedded in image names.
VERSION_TAG Hardcoded constant Yes Set to 1.1.0. The versioned tag applied to each image alongside latest.
System architecture Runtime detection Yes Detected via uname -m. Must be x86_64, amd64, aarch64, or arm64.

Outputs

Name Type Description
9 Docker images (:latest) Docker images Nine images tagged with :latest, available in the local Docker image cache. Names follow the pattern apachehudi/hudi-hadoop_3.3.4-{component}:latest.
9 Docker images (:1.1.0) Docker images The same nine images additionally tagged with :1.1.0 for versioned reference.
Build log output stdout Progress messages for each image build, including the build context path and tag names.
Exit code Integer 0 on success. 1 if any image build fails or if the host architecture is unsupported.

Usage Examples

# Option 1: Direct Docker build (faster, requires pre-built Hudi JARs in target/)
cd docker/
./build_docker_images.sh

# Option 2: Maven-based build (compiles Hudi first, then builds images)
cd docker/
./build_local_docker_images.sh

# Option 3: Build Hudi with integration-tests profile first, then use direct build
# (Ensures target/ directories are populated with JARs)
mvn -Pintegration-tests clean package -DskipTests
cd docker/
./build_docker_images.sh

# Option 4: Build a single image manually with Docker CLI
cd docker/hoodie/hadoop
docker build base -t apachehudi/hudi-hadoop_3.3.4-base:latest

# Verify built images
docker images | grep apachehudi

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment