Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Apache Hudi Docker Demo Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Demo
Last Updated 2026-02-08 20:00 GMT

Overview

Docker Compose environment running a 13-container cluster with Hadoop 3.3.4, Hive 3.1.3, Spark 3.5.3, Kafka 3.7.2, and MinIO for the Apache Hudi demo.

Description

This environment provides a complete multi-container Docker Compose stack for exploring Apache Hudi features. It includes HDFS (NameNode + DataNode), Hive Metastore with PostgreSQL backend, Spark Master/Worker nodes, Zookeeper, Kafka, MinIO object storage, and Jupyter notebooks. The stack supports both AMD64 and ARM64 architectures with automatic platform detection.

Usage

Use this environment to run the Apache Hudi Docker demo for feature exploration, integration testing, and development. It provides all external services needed to demonstrate Hudi table operations via Spark shell, Flink SQL, or Jupyter notebooks.

System Requirements

Category Requirement Notes
OS Linux, Mac OS X Both AMD64 and ARM64 supported
Docker Docker Engine with Compose support Docker Desktop or Docker CE with compose plugin
Memory 16 GB+ RAM recommended 13 containers run simultaneously
Disk 50 GB+ free space Docker images, HDFS data, MinIO storage
Network Ports 2181, 5005, 5432, 7077, 8020, 8080, 8081, 8188, 8888, 9000, 9083, 9090-9093, 9864, 9870, 10000, 10002, 18080, 19888, 29092, 50010, 50020 Must be available on host

Dependencies

Container Services

Service Image Version Ports Purpose
namenode Hadoop 3.3.4 8020, 9000, 9870 HDFS NameNode
datanode1 Hadoop 3.3.4 9864, 50010, 50020 HDFS DataNode
historyserver Hadoop 3.3.4 8188, 19888 MapReduce History Server
hive-metastore-postgresql PostgreSQL 3.1.0 5432 Hive Metastore backend DB
hivemetastore Hive 3.1.3 9083 Hive Metastore Thrift
hiveserver Hive 3.1.3 10000, 10002 HiveServer2 Thrift and Beeline
zookeeper Zookeeper 3.6.4 2181 Kafka coordination
kafka Kafka 3.7.2 9092, 9093, 29092 Event streaming
sparkmaster Spark 3.5.3 7077, 8080, 8888, 4040 Spark Master + Jupyter
spark-worker-1 Spark 3.5.3 8081 Spark Worker
adhoc-1, adhoc-2 Spark 3.5.3 5005 Ad-hoc Spark clients (debugging)
minio MinIO latest 9090, 9091 S3-compatible object storage
mc MinIO Client Bucket initialization

Credentials

The following default credentials are used within the Docker demo (not production-safe):

  • MinIO Access Key: minio
  • MinIO Secret Key: minio123
  • MinIO Endpoint: http://minio:9090
  • PostgreSQL (Hive Metastore): default container credentials

Quick Install

# Clone the repository
git clone https://github.com/apache/hudi.git && cd hudi

# Build Docker images (auto-detects architecture)
./docker/build_docker_images.sh

# Start the demo cluster
./docker/setup_demo.sh

# Access Jupyter Notebook at http://localhost:8888
# Access Spark Master UI at http://localhost:8080
# Access MinIO Console at http://localhost:9091

# Stop and clean up
./docker/stop_demo.sh

Code Evidence

Architecture auto-detection from docker/build_docker_images.sh:21-34:

ARCH=$(uname -m)
if [ "$ARCH" = "x86_64" ]; then
  DOCKER_PLATFORM="linux/amd64"
elif [ "$ARCH" = "aarch64" ] || [ "$ARCH" = "arm64" ]; then
  DOCKER_PLATFORM="linux/arm64"
fi

Demo compose file selection from docker/setup_demo.sh:23:

ARCH=$(uname -m)
# Selects docker-compose_hadoop334_hive313_spark353_{amd64|arm64}.yml

Common Errors

Error Message Cause Solution
Port already in use Host ports conflict with running services Stop conflicting services or adjust Docker Compose port mappings
Container OOM killed Insufficient Docker memory Increase Docker Desktop memory to 16 GB+
Image pull failure Docker Hub rate limiting Use docker login or build images locally with build_docker_images.sh
HDFS not reachable NameNode not fully started Wait for health checks to pass; restart with docker compose restart namenode

Compatibility Notes

  • ARM64 (Apple Silicon): Fully supported with dedicated Docker Compose files for linux/arm64.
  • Docker Desktop: Allocate at least 16 GB RAM and 50 GB disk to Docker Desktop for stable operation.
  • Port conflicts: The demo uses over 20 ports; check for conflicts with local services (especially 8080, 9090, 5432).
  • MinIO vs S3: MinIO provides S3-compatible storage within the demo; real S3 credentials are not required.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment