Implementation:DataTalksClub Data engineering zoomcamp SQLAlchemy Create Engine
| Metadata | |
|---|---|
| Knowledge Sources | DataTalksClub/data-engineering-zoomcamp |
| Domains | SQLAlchemy, PostgreSQL, Database Connectivity, Python |
| Last Updated | 2026-02-09 14:00 GMT |
Overview
Concrete tool for creating a SQLAlchemy Engine object that connects to PostgreSQL using the psycopg driver, with connection parameters assembled from CLI arguments.
Description
This implementation wraps the sqlalchemy.create_engine() function to construct a PostgreSQL connection engine. The connection string is built dynamically using f-string interpolation of five CLI parameters: pg_user, pg_pass, pg_host, pg_port, and pg_db. The resulting engine uses the postgresql+psycopg dialect, which works with the psycopg2-binary package installed as a project dependency.
The engine is created once inside the run() function and then passed to all subsequent pandas.DataFrame.to_sql() calls for chunked data insertion. SQLAlchemy manages connection pooling internally, so no explicit pool configuration is needed for this use case.
Usage
Use this implementation to establish the database connection before starting data ingestion. The engine object is the sole interface between the pandas data loading logic and the PostgreSQL database. It is passed as the con parameter to both schema creation and data insertion calls.
Code Reference
Source Location: 01-docker-terraform/docker-sql/pipeline/ingest_data.py:L49
Signature:
engine = create_engine(f'postgresql+psycopg://{pg_user}:{pg_pass}@{pg_host}:{pg_port}/{pg_db}')
Import:
from sqlalchemy import create_engine
External Dependencies:
| Package | Version | Purpose |
|---|---|---|
sqlalchemy |
>=2.0.44 | Core engine and connection pooling |
psycopg2-binary |
>=2.9.11 | PostgreSQL DBAPI driver (C-based adapter) |
I/O Contract
Inputs
| Name | Type | Default | Description |
|---|---|---|---|
pg_user |
str |
root |
PostgreSQL username (CLI: --pg-user)
|
pg_pass |
str |
root |
PostgreSQL password (CLI: --pg-pass)
|
pg_host |
str |
localhost |
PostgreSQL hostname (CLI: --pg-host); use pgdatabase when running inside Docker network
|
pg_port |
int |
5432 |
PostgreSQL port (CLI: --pg-port)
|
pg_db |
str |
ny_taxi |
Target database name (CLI: --pg-db)
|
Outputs
| Name | Type | Description |
|---|---|---|
engine |
sqlalchemy.engine.Engine |
A connection factory object configured for the specified PostgreSQL database, with built-in connection pooling |
Usage Examples
Basic engine creation with defaults (local development):
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg://root:root@localhost:5432/ny_taxi')
# Verify connectivity
with engine.connect() as conn:
result = conn.execute("SELECT 1")
print(result.fetchone())
Engine creation for Docker network (container-to-container):
from sqlalchemy import create_engine
# When running inside the Docker Compose network,
# use the service name 'pgdatabase' as the hostname
engine = create_engine('postgresql+psycopg://root:root@pgdatabase:5432/ny_taxi')
Using the engine with pandas to_sql:
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('postgresql+psycopg://root:root@localhost:5432/ny_taxi')
df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
df.to_sql(name="test_table", con=engine, if_exists="replace")
Related Pages
- Principle:DataTalksClub_Data_engineering_zoomcamp_Database_Connection
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Docker_Compose_PostgreSQL_Setup
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Pandas_Chunked_CSV_Loading
- Environment:DataTalksClub_Data_engineering_zoomcamp_Docker_PostgreSQL_Python_Environment