Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Apache Hudi Demo Environment Cleanup

From Leeroopedia


Knowledge Sources
Domains DevOps, Development_Environment
Last Updated 2026-02-08 00:00 GMT

Overview

Gracefully tearing down the Apache Hudi Docker demo cluster by stopping all containers, removing associated resources, and cleaning up host-mounted data directories.

Description

The Demo Environment Cleanup principle addresses the orderly shutdown and resource reclamation of the multi-service Hudi demo cluster. A proper cleanup is essential to avoid resource leaks (orphaned containers consuming CPU/memory), port conflicts (preventing subsequent demo startups or other applications from binding to the same ports), and stale state (leftover data from previous runs contaminating new experiments).

The cleanup process involves three distinct operations:

1. Container Shutdown and Removal:

Docker Compose's down command stops all running containers defined in the compose file and removes them, along with any networks that were created. This is the inverse of docker compose up. The down command is preferred over stop because it also removes containers (not just stops them), preventing accumulation of stopped containers in the Docker daemon.

2. Host Mount Directory Cleanup:

The demo environment mounts host directories (specifically /tmp/hadoop_data and /tmp/hadoop_name) into containers for HDFS data and name node metadata. These directories persist on the host after containers are removed. The cleanup script explicitly deletes these directories to ensure a clean slate for the next demo run. Without this cleanup, stale HDFS metadata could cause the NameNode to attempt recovery of a previous cluster state, potentially causing startup failures or data inconsistencies.

3. Named Volume Persistence (Intentional):

Docker Compose down (without the -v flag) preserves named volumes. The demo compose file defines named volumes for namenode, historyserver, hive-metastore-postgresql, and minio-data. These are intentionally preserved by the default cleanup to allow data to survive across restart cycles. Users who want a complete reset can manually add -v to remove volumes as well.

Optional: Test Suite Generation:

The generate_test_suite.sh script provides an adjacent capability that can be run before or after the demo cleanup. It generates integration test suites (sanity, medium, long, clustering) from templates, configures them with specified parameters (table type, iteration counts, delay intervals), and optionally executes them inside the adhoc-2 container. While not strictly part of the cleanup flow, it represents a common workflow performed before tearing down the environment.

Usage

Apply this principle:

  • After completing a demo session and wanting to free system resources
  • Before restarting the demo to ensure a clean state
  • When switching between different Hudi versions or configurations
  • Before running setup_demo.sh if the previous shutdown was not clean (e.g., host reboot)
  • After running integration test suites to clean up test artifacts

Theoretical Basis

Container Lifecycle Management:

Docker containers transition through several states: created, running, paused, stopped, and removed. The docker compose down command moves containers from running to stopped and then removes them in a single operation. This is a best practice for ephemeral development environments because it prevents "container sprawl" -- the accumulation of stopped containers that still consume disk space (for their writable layers) and maintain references in the Docker daemon's state database.

Compose Down Semantics:

The docker compose down command performs the following operations in order:

  1. Stops all containers (sends SIGTERM, waits for graceful shutdown, then SIGKILL after timeout)
  2. Removes all containers
  3. Removes all networks created by up
  4. Preserves named volumes (unless -v is specified)
  5. Preserves images (unless --rmi is specified)

This selective cleanup ensures that expensive-to-create artifacts (images, volumes with data) survive across demo sessions while ephemeral artifacts (containers, networks) are cleaned up.

Architecture-Aware Teardown:

Like the startup script, the cleanup script selects the correct compose file based on the host architecture:

COMPOSE_FILE_NAME="docker-compose_hadoop334_hive313_spark353_amd64.yml"
if [ "$(uname -m)" = "arm64" ]; then
  COMPOSE_FILE_NAME="docker-compose_hadoop334_hive313_spark353_arm64.yml"
fi

This is essential because docker compose down must reference the same compose file that was used for up; otherwise, it will not find the containers to stop. The architecture detection ensures the correct file is always selected, regardless of which platform the demo was started on.

Host Filesystem Cleanup:

The removal of /tmp/hadoop_data and /tmp/hadoop_name addresses a subtlety of Docker bind mounts: data written by containers to bind-mounted directories persists on the host even after the container is removed. For HDFS, this is particularly important because the NameNode's fsimage and edit logs (stored in /tmp/hadoop_name) encode cluster identity and block locations. Leaving stale metadata would cause a fresh NameNode to attempt recovery, potentially entering safe mode or refusing to start.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment