Implementation:Apache Hudi Setup Demo Script
| Knowledge Sources | |
|---|---|
| Domains | DevOps, Development_Environment |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for launching the complete 13-service Hudi demo cluster provided by Apache Hudi Docker demo.
Description
The setup_demo.sh script is the primary entry point for starting the Apache Hudi Docker demo environment. It performs four operations in sequence:
- Tears down any previously running demo containers using
docker compose down - Pulls the latest images from Docker Hub (skipped in
devmode) - Starts all 13 services in detached mode using
docker compose up -d - Executes
setup_demo_container.shinside theadhoc-1andadhoc-2containers to configure Spark and HDFS
The script automatically selects the correct Docker Compose file based on the host CPU architecture (arm64 vs amd64). It passes the HUDI_WS environment variable (derived from the script's parent directory) to Docker Compose, enabling the workspace volume mount.
The companion script setup_demo_container.sh runs inside containers and performs: copying Spark default configuration and log4j2 properties to $SPARK_CONF_DIR, creating HDFS directories for demo data and Spark events, uploading configuration to HDFS, and setting permissions on the Hive sync tool.
Usage
Use this script to:
- Start the demo environment for the first time
- Restart the demo after stopping it
- Switch between Docker Hub images (default) and locally-built images (
devmode)
Code Reference
Source Location
- Repository: Apache Hudi
- File:
docker/setup_demo.sh - Lines: 19-38
- Additional File:
docker/demo/setup_demo_container.sh - Lines: 18-24
- Compose File:
docker/compose/docker-compose_hadoop334_hive313_spark353_amd64.yml - Lines: 15-293
Script
setup_demo.sh:
#!/bin/bash
SCRIPT_PATH=$(cd `dirname $0`; pwd)
HUDI_DEMO_ENV=$1
WS_ROOT=`dirname $SCRIPT_PATH`
COMPOSE_FILE_NAME="docker-compose_hadoop334_hive313_spark353_amd64.yml"
if [ "$(uname -m)" = "arm64" ]; then
COMPOSE_FILE_NAME="docker-compose_hadoop334_hive313_spark353_arm64.yml"
fi
# restart cluster
HUDI_WS=${WS_ROOT} docker compose -f ${SCRIPT_PATH}/compose/${COMPOSE_FILE_NAME} down
if [ "$HUDI_DEMO_ENV" != "dev" ]; then
echo "Pulling docker demo images ..."
HUDI_WS=${WS_ROOT} docker compose -f ${SCRIPT_PATH}/compose/${COMPOSE_FILE_NAME} pull
fi
sleep 5
HUDI_WS=${WS_ROOT} docker compose -f ${SCRIPT_PATH}/compose/${COMPOSE_FILE_NAME} up -d
sleep 15
docker exec -it adhoc-1 /bin/bash /var/hoodie/ws/docker/demo/setup_demo_container.sh
docker exec -it adhoc-2 /bin/bash /var/hoodie/ws/docker/demo/setup_demo_container.sh
setup_demo_container.sh (runs inside containers):
echo "Copying spark default config and setting up configs"
cp /var/hoodie/ws/docker/demo/config/spark-defaults.conf $SPARK_CONF_DIR/.
cp /var/hoodie/ws/docker/demo/config/log4j2.properties $SPARK_CONF_DIR/.
hadoop fs -mkdir -p /var/demo/
hadoop fs -mkdir -p /tmp/spark-events
hadoop fs -copyFromLocal -f /var/hoodie/ws/docker/demo/config /var/demo/.
chmod +x /var/hoodie/ws/hudi-sync/hudi-hive-sync/run_sync_tool.sh
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
$1 (HUDI_DEMO_ENV) |
String argument | No | Pass "dev" to skip image pulling and use locally-built images. Omit or pass any other value to pull from Docker Hub.
|
| HUDI_WS | Environment variable (auto-derived) | Yes | Automatically set to the parent directory of the script location. Used as the bind-mount source for /var/hoodie/ws inside containers.
|
| Docker images | Docker image cache | Yes | Either pre-pulled from Docker Hub or locally built. The 9 Hudi images plus 4 third-party images (PostgreSQL, Zookeeper, Kafka, MinIO/mc). |
| Compose file | YAML file | Yes | Architecture-specific compose file in docker/compose/. Defines all 13 services, their configurations, ports, volumes, and dependencies.
|
Outputs
| Name | Type | Description |
|---|---|---|
| 13 running containers | Docker containers | The full demo cluster: namenode, datanode1, historyserver, hive-metastore-postgresql, hivemetastore, hiveserver, zookeeper, kafkabroker, sparkmaster, spark-worker-1, adhoc-1, adhoc-2, minio, mc. |
| HDFS directories | HDFS filesystem | /var/demo/ and /tmp/spark-events directories created in HDFS with demo configuration uploaded.
|
| Spark configuration | Container filesystem | spark-defaults.conf and log4j2.properties copied to $SPARK_CONF_DIR in adhoc-1 and adhoc-2.
|
| Docker network | Docker network | A Docker bridge network named hudi connecting all 13 services.
|
| Named volumes | Docker volumes | Persistent volumes for namenode, historyserver, hive-metastore-postgresql, and minio-data.
|
Usage Examples
# Start the demo with images pulled from Docker Hub (default)
cd docker/
./setup_demo.sh
# Start the demo using locally-built images (dev mode)
cd docker/
./setup_demo.sh dev
# Verify all containers are running after startup
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
# Access the Spark Master UI
# Open http://localhost:8080 in a browser
# Access the HDFS NameNode UI
# Open http://localhost:9870 in a browser
# Access the HiveServer2 via Beeline
docker exec -it hiveserver beeline -u jdbc:hive2://localhost:10000
# Open a Spark shell in the adhoc-1 container
docker exec -it adhoc-1 /bin/bash
spark-shell --master spark://sparkmaster:7077