Heuristic:DataExpert io Data engineer handbook Docker Volume Persistence Management
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Debugging |
| Last Updated | 2026-02-09 06:00 GMT |
Overview
Docker volume management strategy: use `docker compose stop` to preserve data, use `docker compose down -v` for a full reset.
Description
The bootcamp Docker environments use both named volumes (e.g., `postgres-data`, `pgadmin-data`) and bind mounts (e.g., `./warehouse`) to persist data across container restarts. Understanding the difference between stopping, removing, and volume-pruning containers is critical to avoiding accidental data loss during development. This heuristic captures the tribal knowledge around when to preserve versus reset Docker state.
Usage
Apply this heuristic whenever you need to manage Docker container lifecycle in the bootcamp environment. Common scenarios include:
- Restarting containers after configuration changes
- Resetting the database to a clean state for homework exercises
- Troubleshooting container issues by starting fresh
- Preserving work between development sessions
The Insight (Rule of Thumb)
- Action: Choose the appropriate Docker Compose lifecycle command based on intent.
- Values:
- `docker compose stop` — Stops containers but preserves all volumes and data. Use for pausing work.
- `docker compose down` — Removes containers and networks but preserves named volumes. Use for cleanup without data loss.
- `docker compose down -v` — Removes containers, networks, and named volumes. Use for full reset to re-trigger database initialization.
- Trade-off: Using `down -v` forces the PostgreSQL init script to run again on next `up`, which re-seeds the database from the dump file. This is necessary when you want a clean slate but means any manual changes to the database are lost.
Reasoning
PostgreSQL initialization scripts (in `/docker-entrypoint-initdb.d/`) only run when the data directory is empty. If the `postgres-data` volume persists from a previous run, the init scripts are skipped on subsequent starts. This is by design in the official PostgreSQL Docker image. Therefore:
- If you need to re-run init-db.sh (e.g., after modifying the dump file or adding homework SQL), you must remove the volume with `docker compose down -v`.
- If you just want to restart the same database, use `docker compose stop` followed by `docker compose up -d`.
The bootcamp README explicitly warns: data in `/var/lib/postgresql/data` persists in the `postgres-data` named volume, so stopping or removing the container alone does not reset the database.
Code Evidence
Named volume definitions from `docker-compose.yml` (Module 1):
volumes:
- postgres-data:/var/lib/postgresql/data
volumes:
postgres-data:
pgadmin-data:
Bind mount for Spark warehouse from `docker-compose.yaml` (Module 3):
volumes:
- ./warehouse:/home/iceberg/warehouse
- ./notebooks:/home/iceberg/notebooks/notebooks
- ./data:/home/iceberg/data