Principle:DataTalksClub Data engineering zoomcamp Environment Setup

Metadata
Knowledge Sources	DataTalksClub/data-engineering-zoomcamp
Domains	Docker, Infrastructure, Development Environments
Last Updated	2026-02-09 14:00 GMT

Overview

Using Docker Compose to provision multi-container development environments with databases and administration tools in a single declarative configuration.

Description

Modern data engineering workflows require multiple infrastructure components running simultaneously: databases for storage, administration interfaces for inspection, and application containers for pipeline execution. Manually starting and configuring each service is error-prone and difficult to reproduce across machines.

Docker Compose solves this by defining all services, their configurations, networking, and persistent storage in a single YAML file. Each service declaration specifies the container image, environment variables for initial configuration, volume mounts for data persistence, and port mappings for host access.

The key architectural decisions in environment setup include:

Service isolation: Each component (database, admin tool, pipeline) runs in its own container with its own filesystem and process space.
Declarative configuration: Environment variables such as POSTGRES_USER, POSTGRES_PASSWORD, and POSTGRES_DB control initial provisioning without manual SQL or shell commands.
Named volumes: Data persists across container restarts via Docker-managed named volumes, preventing data loss when containers are recreated.
Port mapping: Host ports are mapped to container ports so that local tools can connect to services as if they were running natively.

This principle applies broadly to any multi-service development environment, not just PostgreSQL and pgAdmin. The same pattern works for Redis, Kafka, Elasticsearch, or any combination of containerized services.

Usage

Use this principle when:

You need a reproducible development environment that can be shared across a team.
Your workflow requires multiple services (e.g., a database plus an admin UI) running together.
You want to avoid installing database servers directly on the host machine.
You need to version-control your infrastructure configuration alongside your application code.

Theoretical Basis

The environment setup pattern follows a declarative infrastructure model. In pseudocode, the process is:

DEFINE services:
    database_service:
        image = "database_vendor:version"
        environment = {USER, PASSWORD, DATABASE_NAME}
        volumes = [named_volume -> data_directory]
        ports = [host_port -> container_port]

    admin_service:
        image = "admin_tool:version"
        environment = {ADMIN_EMAIL, ADMIN_PASSWORD}
        volumes = [named_volume -> tool_data_directory]
        ports = [host_port -> container_port]

DEFINE volumes:
    named_volume_for_database
    named_volume_for_admin

ON "compose up":
    FOR each service in services:
        PULL image if not cached
        CREATE container from image
        INJECT environment variables
        MOUNT volumes
        MAP ports
        START container
    CREATE default network connecting all services

The critical insight is that Docker Compose automatically creates a shared network for all services defined in the same file. This means services can reach each other using their service names as hostnames (e.g., a pipeline container can connect to the database using the hostname pgdatabase rather than an IP address).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment