Implementation:DataTalksClub Data engineering zoomcamp SQLAlchemy Create Engine

Metadata
Knowledge Sources	DataTalksClub/data-engineering-zoomcamp
Domains	SQLAlchemy, PostgreSQL, Database Connectivity, Python
Last Updated	2026-02-09 14:00 GMT

Overview

Concrete tool for creating a SQLAlchemy Engine object that connects to PostgreSQL using the psycopg driver, with connection parameters assembled from CLI arguments.

Description

This implementation wraps the sqlalchemy.create_engine() function to construct a PostgreSQL connection engine. The connection string is built dynamically using f-string interpolation of five CLI parameters: pg_user, pg_pass, pg_host, pg_port, and pg_db. The resulting engine uses the postgresql+psycopg dialect, which works with the psycopg2-binary package installed as a project dependency.

The engine is created once inside the run() function and then passed to all subsequent pandas.DataFrame.to_sql() calls for chunked data insertion. SQLAlchemy manages connection pooling internally, so no explicit pool configuration is needed for this use case.

Usage

Use this implementation to establish the database connection before starting data ingestion. The engine object is the sole interface between the pandas data loading logic and the PostgreSQL database. It is passed as the con parameter to both schema creation and data insertion calls.

Code Reference

Source Location: 01-docker-terraform/docker-sql/pipeline/ingest_data.py:L49

Signature:

engine = create_engine(f'postgresql+psycopg://{pg_user}:{pg_pass}@{pg_host}:{pg_port}/{pg_db}')

Import:

from sqlalchemy import create_engine

External Dependencies:

Package	Version	Purpose
`sqlalchemy`	>=2.0.44	Core engine and connection pooling
`psycopg2-binary`	>=2.9.11	PostgreSQL DBAPI driver (C-based adapter)

I/O Contract

Inputs

Name	Type	Default	Description
`pg_user`	`str`	`root`	PostgreSQL username (CLI: `--pg-user`)
`pg_pass`	`str`	`root`	PostgreSQL password (CLI: `--pg-pass`)
`pg_host`	`str`	`localhost`	PostgreSQL hostname (CLI: `--pg-host`); use `pgdatabase` when running inside Docker network
`pg_port`	`int`	`5432`	PostgreSQL port (CLI: `--pg-port`)
`pg_db`	`str`	`ny_taxi`	Target database name (CLI: `--pg-db`)

Outputs

Name	Type	Description
`engine`	`sqlalchemy.engine.Engine`	A connection factory object configured for the specified PostgreSQL database, with built-in connection pooling

Usage Examples

Basic engine creation with defaults (local development):

from sqlalchemy import create_engine

engine = create_engine('postgresql+psycopg://root:root@localhost:5432/ny_taxi')

# Verify connectivity
with engine.connect() as conn:
    result = conn.execute("SELECT 1")
    print(result.fetchone())

Engine creation for Docker network (container-to-container):

from sqlalchemy import create_engine

# When running inside the Docker Compose network,
# use the service name 'pgdatabase' as the hostname
engine = create_engine('postgresql+psycopg://root:root@pgdatabase:5432/ny_taxi')

Using the engine with pandas to_sql:

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('postgresql+psycopg://root:root@localhost:5432/ny_taxi')

df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
df.to_sql(name="test_table", con=engine, if_exists="replace")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment