Environment:Spotify Luigi PostgreSQL Server
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Database |
| Last Updated | 2026-02-10 07:00 GMT |
Overview
PostgreSQL server environment with psycopg2 or pg8000 Python driver for Luigi database ingestion tasks.
Description
This environment provides the PostgreSQL connectivity required by Luigi's `postgres` contrib module. It supports two Python database drivers: psycopg2 (default, C-based, high performance) and pg8000 (pure Python, no C compiler needed). The driver selection is controlled via the `LUIGI_PGSQL_DRIVER` environment variable. The module uses a marker table pattern to track task completion, ensuring idempotent data loading.
Usage
Use this environment for any pipeline that loads data into PostgreSQL using `CopyToTable`, `PostgresQuery`, or tracks task completion with `PostgresTarget`. It is required for the Database_Ingestion_Pipeline workflow when targeting PostgreSQL.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, Windows | Cross-platform |
| PostgreSQL | PostgreSQL server (any supported version) | Default port: 5432 |
| Network | TCP access to PostgreSQL server | Configurable host and port |
Dependencies
System Packages
- PostgreSQL client libraries (for psycopg2; not needed for pg8000)
Python Packages
- `psycopg2` < 3.0 (default driver, recommended)
- `pg8000` >= 1.23.0 (alternative pure-Python driver)
- `luigi` (core)
Credentials
The following must be provided to task constructors or via configuration:
- `host`: PostgreSQL server hostname
- `database`: Database name
- `user`: Database username
- `password`: Database password
- `port`: Server port (default: 5432)
Environment variables:
- `LUIGI_PGSQL_DRIVER`: Selects the Python database driver (default: `psycopg2`, alternative: `pg8000`)
Configuration in `luigi.cfg`:
- `[postgres] marker-table`: Name of the marker table for completion tracking (default: `table_updates`)
- `[postgres] local-tmp-dir`: Local temporary directory for data staging
Quick Install
# Install Luigi with PostgreSQL support (psycopg2)
pip install luigi "psycopg2<3.0"
# Or with pure-Python pg8000 driver
pip install luigi "pg8000>=1.23.0"
Code Evidence
Driver selection via environment variable from `luigi/contrib/postgres.py:33`:
DB_DRIVER = os.environ.get('LUIGI_PGSQL_DRIVER', 'psycopg2')
psycopg2 import with fallback from `luigi/contrib/postgres.py:41-69`:
if DB_DRIVER == 'psycopg2':
try:
import psycopg2 as dbapi
def update_error_codes():
import psycopg2.errorcodes
DB_ERROR_CODES.update({
psycopg2.errorcodes.DUPLICATE_TABLE: ERROR_DUPLICATE_TABLE,
psycopg2.errorcodes.UNDEFINED_TABLE: ERROR_UNDEFINED_TABLE,
})
update_error_codes()
except ImportError:
pass
if dbapi is None or DB_DRIVER == 'pg8000':
try:
import pg8000.dbapi as dbapi
import pg8000.core
DB_ERROR_CODES.update({
'42P07': ERROR_DUPLICATE_TABLE,
'42P01': ERROR_UNDEFINED_TABLE
})
except ImportError:
pass
if dbapi is None:
logger.warning("Loading postgres module without psycopg2 nor pg8000 installed. "
"Will crash at runtime if postgres functionality is used.")
Default port and marker table from `luigi/contrib/postgres.py:171-174`:
marker_table = luigi.configuration.get_config().get(
'postgres', 'marker-table', 'table_updates')
# ...
DEFAULT_DB_PORT = 5432
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `Loading postgres module without psycopg2 nor pg8000 installed` | Neither PostgreSQL driver installed | `pip install psycopg2` or `pip install pg8000` |
| `connection refused` | PostgreSQL server not running or wrong host/port | Verify server is running and connection parameters |
| `ERROR_UNDEFINED_TABLE` on marker table | Marker table not yet created | Luigi auto-creates it; ensure DB user has CREATE TABLE privilege |
| `ERROR_DUPLICATE_TABLE` | Table already exists during creation | Expected during retry; Luigi handles this gracefully |
Compatibility Notes
- psycopg2 vs pg8000: psycopg2 is faster (C extension) but requires PostgreSQL client libraries to compile. pg8000 is pure Python and easier to install but slower.
- Driver selection: Set `LUIGI_PGSQL_DRIVER=pg8000` to use the pure-Python driver without needing C compilation.
- Connection reset: psycopg2 supports `connection.reset()` while pg8000 uses a different reset mechanism. Luigi handles both transparently.