Implementation:Apache Hudi Build Docker Images Script
| Knowledge Sources | |
|---|---|
| Domains | DevOps, Development_Environment |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for building the nine Docker images that compose the Apache Hudi demo cluster, provided by Apache Hudi Docker demo.
Description
The Hudi project provides two shell scripts for building Docker images:
build_docker_images.sh (direct Docker build) -- This script uses the Docker CLI directly to build all nine images from Dockerfiles in the docker/hoodie/hadoop/ directory tree. It auto-detects the host CPU architecture, configures the Docker build platform, and iterates through a predefined list of image configurations. Each image is tagged with both :latest and :1.1.0. The script exits immediately on any build failure.
build_local_docker_images.sh (Maven-based build) -- This script uses Maven's pre-integration-test lifecycle phase to build images through the project's build system. It first prompts the user to confirm they want to build from scratch (since pre-built images are available on Docker Hub), then invokes Maven from the repository root. This approach compiles Hudi modules first and copies the resulting JARs into the Docker build contexts before image assembly.
Usage
Use these scripts when:
- You need to build Docker images containing locally modified Hudi code
- Pre-built images on Docker Hub are outdated or incompatible with your branch
- You are developing on an architecture not covered by published images (e.g., arm64)
- You want to create a fully self-contained local development environment
Code Reference
Source Location
- Repository: Apache Hudi
- File:
docker/build_docker_images.sh - Lines: 21-73
- Additional File:
docker/build_local_docker_images.sh - Lines: 19-33
Script
build_docker_images.sh (direct Docker build):
#!/bin/bash
# Detect system architecture and set Docker platform accordingly
ARCHITECTURE=$(uname -m)
case "$ARCHITECTURE" in
x86_64|amd64)
DOCKER_PLATFORM='linux/amd64'
;;
aarch64|arm64)
DOCKER_PLATFORM='linux/arm64'
;;
*)
echo "Unsupported architecture: $ARCHITECTURE"
exit 1
;;
esac
export DOCKER_DEFAULT_PLATFORM="$DOCKER_PLATFORM"
export BUILDX_EXPERIMENTAL=1
SCRIPT_DIR=$(cd $(dirname "$0") && pwd)
# Versions for Hadoop, Spark, and Hive
HADOOP_VERSION="3.3.4"
SPARK_VERSION="3.5.3"
HIVE_VERSION="3.1.3"
# Docker image tags
VERSION_TAG="1.1.0"
LATEST_TAG="latest"
DOCKER_CONTEXT_DIR="hoodie/hadoop"
# List of images to build: "subdir|image_base_name"
DOCKER_IMAGES=(
"base|apachehudi/hudi-hadoop_${HADOOP_VERSION}-base"
"datanode|apachehudi/hudi-hadoop_${HADOOP_VERSION}-datanode"
"historyserver|apachehudi/hudi-hadoop_${HADOOP_VERSION}-history"
"hive_base|apachehudi/hudi-hadoop_${HADOOP_VERSION}-hive_${HIVE_VERSION}"
"namenode|apachehudi/hudi-hadoop_${HADOOP_VERSION}-namenode"
"spark_base|apachehudi/hudi-hadoop_${HADOOP_VERSION}-hive_${HIVE_VERSION}-sparkbase_${SPARK_VERSION}"
"sparkadhoc|apachehudi/hudi-hadoop_${HADOOP_VERSION}-hive_${HIVE_VERSION}-sparkadhoc_${SPARK_VERSION}"
"sparkmaster|apachehudi/hudi-hadoop_${HADOOP_VERSION}-hive_${HIVE_VERSION}-sparkmaster_${SPARK_VERSION}"
"sparkworker|apachehudi/hudi-hadoop_${HADOOP_VERSION}-hive_${HIVE_VERSION}-sparkworker_${SPARK_VERSION}"
)
# Build each Docker image in the list
for IMAGE_CONFIG in "${DOCKER_IMAGES[@]}"; do
IFS='|' read -r SUBDIR IMAGE_BASE <<< "$IMAGE_CONFIG"
IMAGE_CONTEXT="$DOCKER_CONTEXT_DIR/$SUBDIR"
TAG_LATEST="$IMAGE_BASE:$LATEST_TAG"
TAG_VERSIONED="$IMAGE_BASE:$VERSION_TAG"
echo "Building $IMAGE_CONTEXT as $TAG_LATEST and $TAG_VERSIONED"
if ! docker build "$IMAGE_CONTEXT" -t "$TAG_LATEST" -t "$TAG_VERSIONED"; then
echo "Error: Failed to build docker image for $IMAGE_CONTEXT"
exit 1
fi
done
build_local_docker_images.sh (Maven-based build):
#!/bin/bash
SCRIPT_PATH=$(cd `dirname $0`; pwd)
WS_ROOT=`dirname $SCRIPT_PATH`
while true; do
read -p "Docker images can be downloaded from docker hub and seamlessly mounted \
with latest HUDI jars. Do you still want to build docker images from scratch?" yn
case $yn in
[Yy]* ) make install; break;;
[Nn]* ) exit;;
* ) echo "Please answer yes or no.";;
esac
done
pushd ${WS_ROOT}
mvn clean pre-integration-test -DskipTests -Ddocker.compose.skip=true -Ddocker.build.skip=false
popd
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| Dockerfiles | Files | Yes | Dockerfile definitions in docker/hoodie/hadoop/ subdirectories: base/, datanode/, historyserver/, hive_base/, namenode/, spark_base/, sparkadhoc/, sparkmaster/, sparkworker/
|
| HADOOP_VERSION | Hardcoded constant | Yes | Set to 3.3.4 in the script. Determines the Hadoop version embedded in image names.
|
| SPARK_VERSION | Hardcoded constant | Yes | Set to 3.5.3 in the script. Determines the Spark version embedded in image names.
|
| HIVE_VERSION | Hardcoded constant | Yes | Set to 3.1.3 in the script. Determines the Hive version embedded in image names.
|
| VERSION_TAG | Hardcoded constant | Yes | Set to 1.1.0. The versioned tag applied to each image alongside latest.
|
| System architecture | Runtime detection | Yes | Detected via uname -m. Must be x86_64, amd64, aarch64, or arm64.
|
Outputs
| Name | Type | Description |
|---|---|---|
| 9 Docker images (:latest) | Docker images | Nine images tagged with :latest, available in the local Docker image cache. Names follow the pattern apachehudi/hudi-hadoop_3.3.4-{component}:latest.
|
| 9 Docker images (:1.1.0) | Docker images | The same nine images additionally tagged with :1.1.0 for versioned reference.
|
| Build log output | stdout | Progress messages for each image build, including the build context path and tag names. |
| Exit code | Integer | 0 on success. 1 if any image build fails or if the host architecture is unsupported. |
Usage Examples
# Option 1: Direct Docker build (faster, requires pre-built Hudi JARs in target/)
cd docker/
./build_docker_images.sh
# Option 2: Maven-based build (compiles Hudi first, then builds images)
cd docker/
./build_local_docker_images.sh
# Option 3: Build Hudi with integration-tests profile first, then use direct build
# (Ensures target/ directories are populated with JARs)
mvn -Pintegration-tests clean package -DskipTests
cd docker/
./build_docker_images.sh
# Option 4: Build a single image manually with Docker CLI
cd docker/hoodie/hadoop
docker build base -t apachehudi/hudi-hadoop_3.3.4-base:latest
# Verify built images
docker images | grep apachehudi