Environment:Apache Hudi Docker Demo Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Demo |
| Last Updated | 2026-02-08 20:00 GMT |
Overview
Docker Compose environment running a 13-container cluster with Hadoop 3.3.4, Hive 3.1.3, Spark 3.5.3, Kafka 3.7.2, and MinIO for the Apache Hudi demo.
Description
This environment provides a complete multi-container Docker Compose stack for exploring Apache Hudi features. It includes HDFS (NameNode + DataNode), Hive Metastore with PostgreSQL backend, Spark Master/Worker nodes, Zookeeper, Kafka, MinIO object storage, and Jupyter notebooks. The stack supports both AMD64 and ARM64 architectures with automatic platform detection.
Usage
Use this environment to run the Apache Hudi Docker demo for feature exploration, integration testing, and development. It provides all external services needed to demonstrate Hudi table operations via Spark shell, Flink SQL, or Jupyter notebooks.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, Mac OS X | Both AMD64 and ARM64 supported |
| Docker | Docker Engine with Compose support | Docker Desktop or Docker CE with compose plugin |
| Memory | 16 GB+ RAM recommended | 13 containers run simultaneously |
| Disk | 50 GB+ free space | Docker images, HDFS data, MinIO storage |
| Network | Ports 2181, 5005, 5432, 7077, 8020, 8080, 8081, 8188, 8888, 9000, 9083, 9090-9093, 9864, 9870, 10000, 10002, 18080, 19888, 29092, 50010, 50020 | Must be available on host |
Dependencies
Container Services
| Service | Image Version | Ports | Purpose |
|---|---|---|---|
| namenode | Hadoop 3.3.4 | 8020, 9000, 9870 | HDFS NameNode |
| datanode1 | Hadoop 3.3.4 | 9864, 50010, 50020 | HDFS DataNode |
| historyserver | Hadoop 3.3.4 | 8188, 19888 | MapReduce History Server |
| hive-metastore-postgresql | PostgreSQL 3.1.0 | 5432 | Hive Metastore backend DB |
| hivemetastore | Hive 3.1.3 | 9083 | Hive Metastore Thrift |
| hiveserver | Hive 3.1.3 | 10000, 10002 | HiveServer2 Thrift and Beeline |
| zookeeper | Zookeeper 3.6.4 | 2181 | Kafka coordination |
| kafka | Kafka 3.7.2 | 9092, 9093, 29092 | Event streaming |
| sparkmaster | Spark 3.5.3 | 7077, 8080, 8888, 4040 | Spark Master + Jupyter |
| spark-worker-1 | Spark 3.5.3 | 8081 | Spark Worker |
| adhoc-1, adhoc-2 | Spark 3.5.3 | 5005 | Ad-hoc Spark clients (debugging) |
| minio | MinIO latest | 9090, 9091 | S3-compatible object storage |
| mc | MinIO Client | — | Bucket initialization |
Credentials
The following default credentials are used within the Docker demo (not production-safe):
- MinIO Access Key:
minio - MinIO Secret Key:
minio123 - MinIO Endpoint:
http://minio:9090 - PostgreSQL (Hive Metastore): default container credentials
Quick Install
# Clone the repository
git clone https://github.com/apache/hudi.git && cd hudi
# Build Docker images (auto-detects architecture)
./docker/build_docker_images.sh
# Start the demo cluster
./docker/setup_demo.sh
# Access Jupyter Notebook at http://localhost:8888
# Access Spark Master UI at http://localhost:8080
# Access MinIO Console at http://localhost:9091
# Stop and clean up
./docker/stop_demo.sh
Code Evidence
Architecture auto-detection from docker/build_docker_images.sh:21-34:
ARCH=$(uname -m)
if [ "$ARCH" = "x86_64" ]; then
DOCKER_PLATFORM="linux/amd64"
elif [ "$ARCH" = "aarch64" ] || [ "$ARCH" = "arm64" ]; then
DOCKER_PLATFORM="linux/arm64"
fi
Demo compose file selection from docker/setup_demo.sh:23:
ARCH=$(uname -m)
# Selects docker-compose_hadoop334_hive313_spark353_{amd64|arm64}.yml
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| Port already in use | Host ports conflict with running services | Stop conflicting services or adjust Docker Compose port mappings |
| Container OOM killed | Insufficient Docker memory | Increase Docker Desktop memory to 16 GB+ |
| Image pull failure | Docker Hub rate limiting | Use docker login or build images locally with build_docker_images.sh
|
| HDFS not reachable | NameNode not fully started | Wait for health checks to pass; restart with docker compose restart namenode
|
Compatibility Notes
- ARM64 (Apple Silicon): Fully supported with dedicated Docker Compose files for
linux/arm64. - Docker Desktop: Allocate at least 16 GB RAM and 50 GB disk to Docker Desktop for stable operation.
- Port conflicts: The demo uses over 20 ports; check for conflicts with local services (especially 8080, 9090, 5432).
- MinIO vs S3: MinIO provides S3-compatible storage within the demo; real S3 credentials are not required.