Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DataTalksClub Data engineering zoomcamp SQLAlchemy Create Engine

From Leeroopedia


Metadata
Knowledge Sources DataTalksClub/data-engineering-zoomcamp
Domains SQLAlchemy, PostgreSQL, Database Connectivity, Python
Last Updated 2026-02-09 14:00 GMT

Overview

Concrete tool for creating a SQLAlchemy Engine object that connects to PostgreSQL using the psycopg driver, with connection parameters assembled from CLI arguments.

Description

This implementation wraps the sqlalchemy.create_engine() function to construct a PostgreSQL connection engine. The connection string is built dynamically using f-string interpolation of five CLI parameters: pg_user, pg_pass, pg_host, pg_port, and pg_db. The resulting engine uses the postgresql+psycopg dialect, which works with the psycopg2-binary package installed as a project dependency.

The engine is created once inside the run() function and then passed to all subsequent pandas.DataFrame.to_sql() calls for chunked data insertion. SQLAlchemy manages connection pooling internally, so no explicit pool configuration is needed for this use case.

Usage

Use this implementation to establish the database connection before starting data ingestion. The engine object is the sole interface between the pandas data loading logic and the PostgreSQL database. It is passed as the con parameter to both schema creation and data insertion calls.

Code Reference

Source Location: 01-docker-terraform/docker-sql/pipeline/ingest_data.py:L49

Signature:

engine = create_engine(f'postgresql+psycopg://{pg_user}:{pg_pass}@{pg_host}:{pg_port}/{pg_db}')

Import:

from sqlalchemy import create_engine

External Dependencies:

Package Version Purpose
sqlalchemy >=2.0.44 Core engine and connection pooling
psycopg2-binary >=2.9.11 PostgreSQL DBAPI driver (C-based adapter)

I/O Contract

Inputs

Name Type Default Description
pg_user str root PostgreSQL username (CLI: --pg-user)
pg_pass str root PostgreSQL password (CLI: --pg-pass)
pg_host str localhost PostgreSQL hostname (CLI: --pg-host); use pgdatabase when running inside Docker network
pg_port int 5432 PostgreSQL port (CLI: --pg-port)
pg_db str ny_taxi Target database name (CLI: --pg-db)

Outputs

Name Type Description
engine sqlalchemy.engine.Engine A connection factory object configured for the specified PostgreSQL database, with built-in connection pooling

Usage Examples

Basic engine creation with defaults (local development):

from sqlalchemy import create_engine

engine = create_engine('postgresql+psycopg://root:root@localhost:5432/ny_taxi')

# Verify connectivity
with engine.connect() as conn:
    result = conn.execute("SELECT 1")
    print(result.fetchone())

Engine creation for Docker network (container-to-container):

from sqlalchemy import create_engine

# When running inside the Docker Compose network,
# use the service name 'pgdatabase' as the hostname
engine = create_engine('postgresql+psycopg://root:root@pgdatabase:5432/ny_taxi')

Using the engine with pandas to_sql:

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('postgresql+psycopg://root:root@localhost:5432/ny_taxi')

df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
df.to_sql(name="test_table", con=engine, if_exists="replace")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment