Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:DataExpert io Data engineer handbook Dimensional Data Modeling Environment Setup

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Data_Modeling, Docker
Last Updated 2026-02-09 06:00 GMT

Overview

End-to-end process for setting up a PostgreSQL-based development environment for dimensional and fact data modeling exercises using Docker Compose.

Description

This workflow provisions a local data engineering environment for Weeks 1 and 2 of the DataExpert.io intermediate bootcamp. It uses Docker Compose to spin up a PostgreSQL 14 database and PGAdmin web interface, automatically seeds the database with training data via a shell initialization script, and establishes connectivity through a browser-based SQL tool. The environment supports hands-on exercises in Slowly Changing Dimensions (SCD), cumulative table design, graph data modeling, fact tables, and deduplication patterns.

Usage

Execute this workflow when beginning the intermediate bootcamp curriculum and you need a pre-configured PostgreSQL environment with sample data for practicing dimensional and fact data modeling. This is the foundational setup that Weeks 1 (Dimensional Data Modeling) and 2 (Fact Data Modeling) depend upon.

Execution Steps

Step 1: Clone_Repository

Obtain the course materials by cloning the data-engineer-handbook repository from GitHub and navigating to the dimensional data modeling module directory.

Key considerations:

  • SSH key authentication must be configured for GitHub access
  • The target directory is the Week 1 dimensional data modeling folder
  • This same environment is reused for Week 2 (Fact Data Modeling)

Step 2: Configure_Environment

Create environment configuration by copying the provided template file. This file defines credentials for PostgreSQL and PGAdmin services, including database name, user, password, container name, and port mappings.

Key considerations:

  • The .env file stores credentials used by both PostgreSQL and PGAdmin containers
  • Default credentials are postgres/postgres for the database
  • PGAdmin defaults to postgres@postgres.com as the admin email
  • The .env file must exist before Docker Compose can start

Step 3: Start_Docker_Services

Launch the PostgreSQL and PGAdmin containers using Docker Compose. The compose file defines two services: a PostgreSQL 14 instance with persistent volume storage and a PGAdmin 4 web interface. The PostgreSQL container mounts the data dump and init script into the Docker entrypoint directory for automatic initialization.

What happens:

  • PostgreSQL 14 container starts on port 5432 (configurable via HOST_PORT)
  • PGAdmin container starts on port 5050 (configurable via PGADMIN_PORT)
  • Persistent volumes are created for both database data and PGAdmin configuration
  • The init-db.sh script runs automatically on first container startup

Step 4: Seed_Database

The database is automatically seeded during container initialization. The init-db.sh script executes pg_restore to load the data dump file, then iterates through any SQL files in the homework directory to set up additional tables and data needed for exercises.

What happens:

  • pg_restore loads the main data.dump file with sample dimensional data
  • Any .sql files in the homework directory are executed sequentially
  • Tables for players, teams, and related entities are created and populated
  • The script uses --no-owner and --no-privileges flags for clean restoration

Step 5: Connect_Query_Tool

Establish a connection to the database using PGAdmin (browser-based) or a desktop client (DataGrip, DBeaver, VS Code). For PGAdmin, navigate to localhost:5050, log in with .env credentials, and register a new server connection pointing to the PostgreSQL container hostname.

Key considerations:

  • PGAdmin connects via container name (my-postgres-container) as the host, not localhost
  • Desktop clients connect via localhost:5432
  • Verify tables are loaded by expanding Schemas > public > Tables
  • The Query Tool requires selecting the database in the browser tree first

Step 6: Verify_And_Query

Confirm the environment is correctly configured by querying the loaded tables. Run basic SELECT queries against the dimensional model tables to verify data integrity and table structure. The environment is now ready for SCD, cumulative table, and graph data modeling exercises.

Key considerations:

  • Common troubleshooting: port 5432 conflicts with local PostgreSQL installations
  • Use docker ps to verify containers are running
  • Data persists across container restarts via mounted volumes
  • Use docker compose down to stop and remove containers when done

Execution Diagram

GitHub URL

Workflow Repository