Workflow:DataExpert io Data engineer handbook Dimensional Data Modeling Environment Setup
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Data_Modeling, Docker |
| Last Updated | 2026-02-09 06:00 GMT |
Overview
End-to-end process for setting up a PostgreSQL-based development environment for dimensional and fact data modeling exercises using Docker Compose.
Description
This workflow provisions a local data engineering environment for Weeks 1 and 2 of the DataExpert.io intermediate bootcamp. It uses Docker Compose to spin up a PostgreSQL 14 database and PGAdmin web interface, automatically seeds the database with training data via a shell initialization script, and establishes connectivity through a browser-based SQL tool. The environment supports hands-on exercises in Slowly Changing Dimensions (SCD), cumulative table design, graph data modeling, fact tables, and deduplication patterns.
Usage
Execute this workflow when beginning the intermediate bootcamp curriculum and you need a pre-configured PostgreSQL environment with sample data for practicing dimensional and fact data modeling. This is the foundational setup that Weeks 1 (Dimensional Data Modeling) and 2 (Fact Data Modeling) depend upon.
Execution Steps
Step 1: Clone_Repository
Obtain the course materials by cloning the data-engineer-handbook repository from GitHub and navigating to the dimensional data modeling module directory.
Key considerations:
- SSH key authentication must be configured for GitHub access
- The target directory is the Week 1 dimensional data modeling folder
- This same environment is reused for Week 2 (Fact Data Modeling)
Step 2: Configure_Environment
Create environment configuration by copying the provided template file. This file defines credentials for PostgreSQL and PGAdmin services, including database name, user, password, container name, and port mappings.
Key considerations:
- The .env file stores credentials used by both PostgreSQL and PGAdmin containers
- Default credentials are postgres/postgres for the database
- PGAdmin defaults to postgres@postgres.com as the admin email
- The .env file must exist before Docker Compose can start
Step 3: Start_Docker_Services
Launch the PostgreSQL and PGAdmin containers using Docker Compose. The compose file defines two services: a PostgreSQL 14 instance with persistent volume storage and a PGAdmin 4 web interface. The PostgreSQL container mounts the data dump and init script into the Docker entrypoint directory for automatic initialization.
What happens:
- PostgreSQL 14 container starts on port 5432 (configurable via HOST_PORT)
- PGAdmin container starts on port 5050 (configurable via PGADMIN_PORT)
- Persistent volumes are created for both database data and PGAdmin configuration
- The init-db.sh script runs automatically on first container startup
Step 4: Seed_Database
The database is automatically seeded during container initialization. The init-db.sh script executes pg_restore to load the data dump file, then iterates through any SQL files in the homework directory to set up additional tables and data needed for exercises.
What happens:
- pg_restore loads the main data.dump file with sample dimensional data
- Any .sql files in the homework directory are executed sequentially
- Tables for players, teams, and related entities are created and populated
- The script uses --no-owner and --no-privileges flags for clean restoration
Step 5: Connect_Query_Tool
Establish a connection to the database using PGAdmin (browser-based) or a desktop client (DataGrip, DBeaver, VS Code). For PGAdmin, navigate to localhost:5050, log in with .env credentials, and register a new server connection pointing to the PostgreSQL container hostname.
Key considerations:
- PGAdmin connects via container name (my-postgres-container) as the host, not localhost
- Desktop clients connect via localhost:5432
- Verify tables are loaded by expanding Schemas > public > Tables
- The Query Tool requires selecting the database in the browser tree first
Step 6: Verify_And_Query
Confirm the environment is correctly configured by querying the loaded tables. Run basic SELECT queries against the dimensional model tables to verify data integrity and table structure. The environment is now ready for SCD, cumulative table, and graph data modeling exercises.
Key considerations:
- Common troubleshooting: port 5432 conflicts with local PostgreSQL installations
- Use docker ps to verify containers are running
- Data persists across container restarts via mounted volumes
- Use docker compose down to stop and remove containers when done