Implementation:DataExpert io Data engineer handbook Docker Compose PostgreSQL Stack
Overview
This page documents the docker-compose.yml configuration used to provision a PostgreSQL database and PGAdmin administration interface for the Dimensional Data Modeling workflow. The stack provides a fully containerized environment for running SQL exercises against the actor_films dataset.
Type
External Tool Doc
Source
- Repository: https://github.com/DataExpert-io/data-engineer-handbook
- File:
intermediate-bootcamp/materials/1-dimensional-data-modeling/docker-compose.yml(Lines 1-32)
Services
The Docker Compose stack defines two services:
Service: postgres
| Property | Value | Description |
|---|---|---|
| Image | postgres:14 |
Official PostgreSQL 14 Docker image. |
| Container Name | ${DOCKER_CONTAINER} |
Configurable via environment variable for easy identification. |
| Restart Policy | always |
Ensures the container restarts automatically on failure or host reboot. |
| Ports | ${HOST_PORT}:5432 |
Maps the configurable host port to PostgreSQL's default internal port. |
| Schema | ${POSTGRES_SCHEMA} |
Optional schema configuration for the database. |
Environment Variables:
environment:
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: ${POSTGRES_DB}
Volume Mounts:
volumes:
- postgres-data:/var/lib/postgresql/data # Named volume for data persistence
- ./scripts/init-db.sh:/docker-entrypoint-initdb.d/init-db.sh # Initialization script
- ./data/data.dump:/docker-entrypoint-initdb.d/data.dump # Base dataset dump
- ./homework:/docker-entrypoint-initdb.d/homework # Homework SQL files
The volume configuration achieves three goals:
- Data persistence -- The named volume
postgres-dataensures database contents survive container restarts. - Automatic initialization -- Files in
/docker-entrypoint-initdb.d/are automatically executed when the database is first created. - Homework injection -- The homework directory is mounted so that exercise SQL files are available to the init script.
Service: pgadmin
| Property | Value | Description |
|---|---|---|
| Image | dpage/pgadmin4 |
Official PGAdmin 4 Docker image providing a web-based database administration interface. |
| Ports | ${PGADMIN_PORT}:80 |
Maps the configurable host port to PGAdmin's internal HTTP port. |
| Depends On | postgres |
Ensures the PostgreSQL service starts before PGAdmin. |
Environment Variables:
environment:
PGADMIN_DEFAULT_EMAIL: ${PGADMIN_EMAIL}
PGADMIN_DEFAULT_PASSWORD: ${PGADMIN_PASSWORD}
Configuration Parameters
All configurable parameters are injected via a .env file placed alongside the docker-compose.yml:
| Parameter | Description | Example Value |
|---|---|---|
DOCKER_CONTAINER |
Name assigned to the PostgreSQL container. | dim-model-postgres
|
POSTGRES_SCHEMA |
Optional default schema for the database. | public
|
POSTGRES_USER |
PostgreSQL superuser account name. | postgres
|
POSTGRES_PASSWORD |
Password for the PostgreSQL superuser. | postgres
|
POSTGRES_DB |
Name of the default database created on initialization. | postgres
|
HOST_PORT |
Host machine port mapped to PostgreSQL's port 5432. | 5433
|
PGADMIN_EMAIL |
Email address used as the PGAdmin login username. | admin@admin.com
|
PGADMIN_PASSWORD |
Password for the PGAdmin web interface. | admin
|
PGADMIN_PORT |
Host machine port mapped to PGAdmin's port 80. | 5050
|
Outputs
The stack exposes two network endpoints on the host machine:
| Service | Host Endpoint | Description |
|---|---|---|
| PostgreSQL | localhost:${HOST_PORT} (default: localhost:5433) |
PostgreSQL database accessible via psql, DBeaver, or any PostgreSQL client.
|
| PGAdmin | http://localhost:${PGADMIN_PORT} (default: http://localhost:5050) |
Web-based administration interface for browsing tables, running queries, and inspecting data. |
Usage
Starting the Stack
# Navigate to the dimensional modeling directory
cd intermediate-bootcamp/materials/1-dimensional-data-modeling
# Start all services in detached mode
docker compose up -d
Connecting to PostgreSQL
# Connect via psql from the host
psql -h localhost -p ${HOST_PORT} -U ${POSTGRES_USER} -d ${POSTGRES_DB}
# Or execute a query directly
psql -h localhost -p 5433 -U postgres -d postgres -c "SELECT COUNT(*) FROM actor_films;"
Stopping the Stack
# Stop services (preserves data)
docker compose down
# Stop services and remove data volumes (full reset)
docker compose down -v
Initialization Flow
The following sequence occurs when the stack starts for the first time:
1. Docker Compose creates the 'postgres-data' named volume
2. The postgres:14 container starts and initializes an empty database
3. PostgreSQL detects files in /docker-entrypoint-initdb.d/ and executes them:
a. init-db.sh runs pg_restore to load data.dump
b. init-db.sh iterates over homework/*.sql and executes each file via psql
4. PostgreSQL completes initialization and begins accepting connections
5. The pgadmin container starts and connects to the shared Docker network
6. PGAdmin becomes available on the configured host port
On subsequent startups (when the postgres-data volume already contains data), the initialization scripts are not re-executed. To force re-initialization, remove the volume with docker compose down -v.
Network Architecture
Host Machine
|
|-- localhost:${HOST_PORT} --> [postgres container:5432]
| |
|-- localhost:${PGADMIN_PORT} --> [pgadmin container:80]
| |
+-- Docker Network (internal) -----+
(pgadmin connects to postgres by service name)
Within the Docker network, PGAdmin can reach PostgreSQL using the service name postgres as the hostname, avoiding the need to know the container's IP address.
Related Pages
- Principle:DataExpert_io_Data_engineer_handbook_Docker_Environment_Setup -- The theoretical foundation for containerized development environments.
- Implementation:DataExpert_io_Data_engineer_handbook_Pg_restore_Init_Script -- The
init-db.shscript mounted into this stack and executed during initialization. - Principle:DataExpert_io_Data_engineer_handbook_Database_Seeding -- The database seeding principle implemented by this stack's initialization flow.
- Environment:DataExpert_io_Data_engineer_handbook_PostgreSQL_Docker_Environment
- Heuristic:DataExpert_io_Data_engineer_handbook_Docker_Volume_Persistence_Management
Metadata
- Knowledge Sources: Data Engineer Handbook
- Domains: Data_Engineering, SQL, Infrastructure
- Last Updated: 2026-02-09 06:00 GMT