Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Hudi Setup Demo Script

From Leeroopedia


Knowledge Sources
Domains DevOps, Development_Environment
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for launching the complete 13-service Hudi demo cluster provided by Apache Hudi Docker demo.

Description

The setup_demo.sh script is the primary entry point for starting the Apache Hudi Docker demo environment. It performs four operations in sequence:

  1. Tears down any previously running demo containers using docker compose down
  2. Pulls the latest images from Docker Hub (skipped in dev mode)
  3. Starts all 13 services in detached mode using docker compose up -d
  4. Executes setup_demo_container.sh inside the adhoc-1 and adhoc-2 containers to configure Spark and HDFS

The script automatically selects the correct Docker Compose file based on the host CPU architecture (arm64 vs amd64). It passes the HUDI_WS environment variable (derived from the script's parent directory) to Docker Compose, enabling the workspace volume mount.

The companion script setup_demo_container.sh runs inside containers and performs: copying Spark default configuration and log4j2 properties to $SPARK_CONF_DIR, creating HDFS directories for demo data and Spark events, uploading configuration to HDFS, and setting permissions on the Hive sync tool.

Usage

Use this script to:

  • Start the demo environment for the first time
  • Restart the demo after stopping it
  • Switch between Docker Hub images (default) and locally-built images (dev mode)

Code Reference

Source Location

  • Repository: Apache Hudi
  • File: docker/setup_demo.sh
  • Lines: 19-38
  • Additional File: docker/demo/setup_demo_container.sh
  • Lines: 18-24
  • Compose File: docker/compose/docker-compose_hadoop334_hive313_spark353_amd64.yml
  • Lines: 15-293

Script

setup_demo.sh:

#!/bin/bash
SCRIPT_PATH=$(cd `dirname $0`; pwd)
HUDI_DEMO_ENV=$1
WS_ROOT=`dirname $SCRIPT_PATH`
COMPOSE_FILE_NAME="docker-compose_hadoop334_hive313_spark353_amd64.yml"
if [ "$(uname -m)" = "arm64" ]; then
  COMPOSE_FILE_NAME="docker-compose_hadoop334_hive313_spark353_arm64.yml"
fi
# restart cluster
HUDI_WS=${WS_ROOT} docker compose -f ${SCRIPT_PATH}/compose/${COMPOSE_FILE_NAME} down
if [ "$HUDI_DEMO_ENV" != "dev" ]; then
  echo "Pulling docker demo images ..."
  HUDI_WS=${WS_ROOT} docker compose -f ${SCRIPT_PATH}/compose/${COMPOSE_FILE_NAME} pull
fi
sleep 5
HUDI_WS=${WS_ROOT} docker compose -f ${SCRIPT_PATH}/compose/${COMPOSE_FILE_NAME} up -d
sleep 15

docker exec -it adhoc-1 /bin/bash /var/hoodie/ws/docker/demo/setup_demo_container.sh
docker exec -it adhoc-2 /bin/bash /var/hoodie/ws/docker/demo/setup_demo_container.sh

setup_demo_container.sh (runs inside containers):

echo "Copying spark default config and setting up configs"
cp /var/hoodie/ws/docker/demo/config/spark-defaults.conf $SPARK_CONF_DIR/.
cp /var/hoodie/ws/docker/demo/config/log4j2.properties $SPARK_CONF_DIR/.
hadoop fs -mkdir -p /var/demo/
hadoop fs -mkdir -p /tmp/spark-events
hadoop fs -copyFromLocal -f /var/hoodie/ws/docker/demo/config /var/demo/.
chmod +x /var/hoodie/ws/hudi-sync/hudi-hive-sync/run_sync_tool.sh

I/O Contract

Inputs

Name Type Required Description
$1 (HUDI_DEMO_ENV) String argument No Pass "dev" to skip image pulling and use locally-built images. Omit or pass any other value to pull from Docker Hub.
HUDI_WS Environment variable (auto-derived) Yes Automatically set to the parent directory of the script location. Used as the bind-mount source for /var/hoodie/ws inside containers.
Docker images Docker image cache Yes Either pre-pulled from Docker Hub or locally built. The 9 Hudi images plus 4 third-party images (PostgreSQL, Zookeeper, Kafka, MinIO/mc).
Compose file YAML file Yes Architecture-specific compose file in docker/compose/. Defines all 13 services, their configurations, ports, volumes, and dependencies.

Outputs

Name Type Description
13 running containers Docker containers The full demo cluster: namenode, datanode1, historyserver, hive-metastore-postgresql, hivemetastore, hiveserver, zookeeper, kafkabroker, sparkmaster, spark-worker-1, adhoc-1, adhoc-2, minio, mc.
HDFS directories HDFS filesystem /var/demo/ and /tmp/spark-events directories created in HDFS with demo configuration uploaded.
Spark configuration Container filesystem spark-defaults.conf and log4j2.properties copied to $SPARK_CONF_DIR in adhoc-1 and adhoc-2.
Docker network Docker network A Docker bridge network named hudi connecting all 13 services.
Named volumes Docker volumes Persistent volumes for namenode, historyserver, hive-metastore-postgresql, and minio-data.

Usage Examples

# Start the demo with images pulled from Docker Hub (default)
cd docker/
./setup_demo.sh

# Start the demo using locally-built images (dev mode)
cd docker/
./setup_demo.sh dev

# Verify all containers are running after startup
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

# Access the Spark Master UI
# Open http://localhost:8080 in a browser

# Access the HDFS NameNode UI
# Open http://localhost:9870 in a browser

# Access the HiveServer2 via Beeline
docker exec -it hiveserver beeline -u jdbc:hive2://localhost:10000

# Open a Spark shell in the adhoc-1 container
docker exec -it adhoc-1 /bin/bash
spark-shell --master spark://sparkmaster:7077

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment